PARALLEL PROCESSING OF MOLECULAR DYNAMICS
SIMULATION IN A DISTRIBUTED COMPUTING
ENVrRONMENT AND APPLICATION TO THE
MODIFIED EMBEDDED ATOM METHOD
AND NANOINDENTATION
By
DAVID L. STOKES
Bachelor of Science
Oklahoma State University
Stillwater, Oklahoma
1998
Submitted to the Faculty of the
Graduate College of the
Oklahoma State University
in pal1ial fulfillment of
the requirements for
the Degree of
MASTER OF SCIENCE
December, 2000
PARALLEL PROCESSING OF MOLECULAR DYNAMICS
SIMULATION IN A DISTRIBUTED COMPUTING
E VIRONME T AND APPLICATION TO THE
MODIFIED EMBEDDED ATOM METHOD
AND NANOINDENTAnON
Thesi. Approved:
SAdt:
•
\I
ACKNOWLEDGEMENTS
Thanks are due to my wife, Kecia Stokes, for her patience, support, and
understanding. Thank also to my daughter, Drew  while he i far too young to
realize it, she has been a great source of inspiration.
I also want to thank my advisor, Dr. Ranga Komanduri, for his help and
advice. I have had the privilege of working for Dr. Komanduri for many years and I
feel that this experience has been as valuable a part of my education as my
coursework. Thanks to Dr. Lionel Raff for the many meetings and frequent and
invaluable uggestions. Thanks also to Dr. Young and Dr. Price for serving on my
committee.
Thanks are also due to my fellow research assistants. Naga Chandrasekaran
has led all of us by example, and has paved the way with regards to molecular
dynamics research at our lab. Kalyan Mavuletti and I worked in conjunction in the
early stages of the MEAM project  without this initial group effort, my own work
would have been infinitely more difficult. And thanks to Matt Lee, architect of
MDWulf, without whom there would be no distributed computing cluster.
III
Chapter
TABLE OF CONTE TS
Pag
I. I:NTRODUCTION 1
1.1 Molecular Dynamics Modeling of UltraPreci ion Machining
and Grinding I
1.2 Parallel Proce sing Via the Message Passing Paradigm 4
J.3 The Modified Embedded Atom Method (MEAM) 8
II. LITERATURE REVIEW 12
2.1
')')
2.3
Parallel Molecular Dynamics Modeling 12
MD Modeling of NanoIndentation 17
Development of the MEAM 20
III. PROBLEM STATEMENT 25
IV. THEORY OF MD SIMULATION , .. , 28
4.1 Introduction " , 28
4.2 Units Used in the MD Model. 28
4.3 Calculation of Interatomic Forces 30
4.4 Integration of the Equations of Motion 32
4.5 Boundary and Thermal Con iderations 33
4.6 Crystal Relaxation , 35
4.7 Algorithm Design Issues 37
V. THE MODIFIED EMBEDDED ATOM METHOD (MEAM) 41
5.1 Introduction , 42
5.2 Formulation of the MEAM for Interactions Between
Like Materials , , 43
5.3 Formulation of the MEAM for Interactions Between
Unlike Materials 47
5.4 Angular Screening , , 48
5.5 Validation of the Implementation 53
VI. RELAXATION WITH THE CONJUGATE GRADIENT METHOD 56
6.1 Introduction 56
6.2 Optimization of the Relaxation Process 57
6.3 The Conjugate Gradient Method 59
IV
Chapter Page
VII. PARALLEL ALGORITHM DESIO FOR MD 63
7.1 Introduction 63
7.2 The Parallel Computing Environment 64
7.3 Design and Performance of the Parallel Algorithm 65
VIII. MD MODELING OF NANOMETRIC INDENTATION
WITH THE MEAM , 73
8.1 Introduction 73
8.2 Indentation with MEAM at the Interface 74
8.3 Indentation with Morse at the Interface 79
IX. CONCLUSIONS , " , 83
9.1 Summary of Results 83
9.2 FutureWork 86
REFERENCES 88
LIST OF TABLES
Table Pag
4.1 Physical Units of the MD Model. 29
8.1 Comparison of MD and Theoretical Strengths for Various Materials 80
VI
LIST OF FIGURES
~~re P~
4.1 Illustration of Angular Screening Method 36
5.1 lHustration of Angular Screening Method 48
5.2 Conservation of Energy for Cu System 52
5.3 Conservation of Energy for AgFe System 53
5.4 Backintegration for Cu System 54
5.5 Backintegration for AgFe System 54
6. I Run Times of Significant Functions in the MEAM Code 57
6.2 Comparison of the Damped Trajectory and Conjugate Gradient Methods 61
7.1 Configuration of the Parallel Processing Environment 64
7.2 Process Flow of the Root Algorithm 69
7.3 Process Flow of the Slave Algorithm 70
7.4 Comparison of Run Times for ParalleJ and Sequential Algorithms 71
~.l Indentation of Fe with Diamond Using MEAM at the Interface 76
8.2 Indentation of Fe with Tungsten Using MEAM at the Interface 77
8.3 Indentation of Cu Using Morse at the Interface 8 J
8.4 Indentation of Fe Using Morse at the Interface 82
VII
CHAPTER 1
INTRODUCTION
1.1 Molecular Dynamics Modeling of Ultra Precision Machining and Grinding
Advancements in technology are due in great part to the ability to achieve
increasingly finer machining tolerances. Ultra Precision Machining CUPM) and Ultra
Precision Grinding (UPG) are terms which refer to such highly precise material removal
techniques. These methods are used in a number of important applications, including the
manufacture of highprecision optics, hard drives, and wafers for the semiconductor
industry.
As for any physical process of importance to industry or the scientific community,
having an accurate theoretical anclJor computerbased model of UPM and UPG is highly
desirable. Experimental investigation can be subject to high co ts, experimental error,
concerns about the repeatability of observed phenomena, and even potential health and
environmental issues. An accurate computer model can augment experimental research
or even, once the model has reached maturity, partially replace experimental research or
testing.
For example, billion of dollars worth of de ign deci ions are ba ed partly or
wholly on computerbased analysis with the finite element method (FEM). De pite the
pioneering work of Hrenikoff, Turner, Clough, Arhyris, et. at. [1] a few decades ago,
application of FEM to complex problems awaited advancements in computer cience and
computer engineering. Today, with the availability of inexpen ive and pow rful
computers, FEM is recognized as a viable simulation tool in the areas of deformation and
stress analysis, heat transfer, and flow problems.
Molecular dynamics (MD) modeling has the potential to be as significant as FEM.
MD, one of the most widely employed simulation methods for studying the condensed
state, is concerned with the interactions and behavior of individual atoms. Atoms are
treated as point masses having defined potential interactions with neighboring atom .
The specified potential model is used to calculate forces on all atoms in the system, and
the Hamiltonian equations of motion are integrated for each atom.
Because MD describes the behavior of materials at the atomic level, MD can be
used to model UPM and UPG. When small amounts of material are removed during
machining, the discrete nature of matter becomes important  at this level, continuum
mechanicsbased methods (e.g. FEM) fail.
Specifically, molecular dynamics is an ideal simulation tool for UPM and UPG
for two reasons:
2
• MD offers the same potential advantages as other simulation tools  the co t and
complexity of experimental research can be reduced or eliminated altogether.
UPM and UPG is performed using machine tools which are highly preci e and
rigid. Contamination, vibration, and other external influences must be avoided,
and single crystal diamond tools and diamondabrasive grinding wheel mu t be
used. Such equipment is prohibitively expensive for many researcher.
• MD can model processes which are impractical or impo ible to inve tigate
experimentally. Even with stateoftheart technology, precise control and
observation of material removal at the atomic level i not possible. MD,
however. becomes increasingly viable when fewer numbers of atoms are
considered. MD can be used, therefore, to promote and guide the development
of new technology to achieve increasingly more precise machining tolerances.
As mentioned above, molecular dynamics imulation of materials becom more
viable as the scale of the problem is reduced. Thi is due to the fact that, a with FEM in
the 1960's, the application of MD to realworld problems i limited by the speed of
modern computers. With six coupled differential equations to be integrated per atom per
time step, MD is computationally expensive. As a result, manufacturing process
engineers employing MD for their research are often forced to use twodimensional
models and/or very high cutting velocities, to machine over limited cutting distances and
with limited depths of cut, or to instigate clever techniques such as Length Restricted
Molecular Dynamics (LRMD) as developed by Chadrasekaran [2].
3
Despite these limitations, nanoindentation molecular dynamics experiment can
be used to effectively model the action of a single abra ive grain during grinding and the
behavior of workpiece materials in general. Direct observation of the mechanism of chip
formation and workpiece deformation at this level is, as mentioned above, difficult, time
consuming, and expensive. Variation of experimental parameter , such as shape of the
indenter, indentation depth and speed, and crystal orientation can be achieved precisely
and easily with MD, without the need for expensive diamond tooling, single cry tal
samples, or machine tools.
The computationally intensive nature of molecular dynamics simulation can be
partially compensated for hy the use of parallel processing. The effective application of
parallel computer hardware and algorithm design to MD, and to the nbody problem in
general, is a challenging and potentially expensive endeavor. Molecular dynamic
simulation is inherently parallelizable, but at a high level  vector processing, the basis
for the success of supercomputers, does not apply well. An alternative approach is
necessary, both in terms of hardware design and algorithm design.
1.2 Parallel Processing via the Message Passing Paradigm
While parallel computing in a distributed computing environment incorporating
the "componentsofftheshelf' (COTS) philosophy and the message passing paradigm is
still a new and rapidly evolving field, one thing remains true  for a fraction of the cost,
researchers can build their own high performance computing environment which
4
approaches the performance of expen ive multiprocessor machines made by SGI, Cray
liM, and Compaq.
Because increased performance from a single processor is limited primarily by the
ability to dissipate waste heat, computer scientists have resorted to using multiple
processors to tackle larger and larger engineering and scientific simulation models. To
quote an amusing analogy: If one wants to pull a heavy cart, it' easier to team up
several horses rather than breed a single "superhorse".
This is the approach taken by the developers of traditional supercomputer, uch
as Cray. Two or more processors are housed in the same case which share the same
memory and storage resources. At this point. however, the "horse and cart" analogy fail.
While it is easy for many horses to efficiently subdivide the total workload, things are
much more difficult in the computer science world.
There are three basic approaches to this difficulty:
• One is to develop "smart" compilers which take sequential code (a "normal"
program in which each computation is performed only after the previou
computation has been competed) and attempt to identify which steps are
independent and can be executed simultaneously on different proces ors. Thi
approach is called a "vector" approach. For example, it is common to perform
the same operation on each member of an array (i.e. vector)  a "smart" compiler
5
can take sequential code and ubdivide this work among eparate processor.
with no extra effort on the part of the programmer.
• Another method is to u e a true parallel architecture, uch a a tran puter. In thi
method, many very simple processors, each with their own small memory pace,
are interconnected on the same circuit board. Programming for thi type of
architecture necessitate the u e of custom, low level languages, and is frequently
exceedingly difficult for complex problems, such as MD.
• The final method is to use multiple fullfeatured processors, each with their own
separate memory space. This architecture requires the manual parallelization of
the desired simulation model, such as subdividing the grid or simulation space
over the many processors. Algorithm design is governed by the desire to
maximize processor independence  when data exchange is necessary. it is done
via "message passing", whereby the programmer specifies exactly how and when
to exchange data.
A big advantage to multiple processors in a single machine (frequently called
SMPs, or "symmetric multiprocessors") is that they share the same memory pace. That
is, if one processor needs the results of a computation performed by another processor,
the data is right there in local RAM. As a result, interprocessor communication is fast.
The disadvantage to this approach is that no compiler can automatically parallelize code
as well as an experienced programmer who is willing to invest the time to do it him elf.
Also, as more processors share the same address space, it becomes increasingly more
difficult to manage access to memory and to maintain fault tolerance. In addition, many
6
problems (of which MD is a pnme example) do not greatly benefit from vector
processing.
As a result, the MD researcher looking for the be t performance out of hi
supercomputer is often forced to abandon vector processing and take the message pas ing
approach on his supercomputer. Here, the programmer manually parallelizes the code at
a high level. A parallel algorithm is designed in which independent processing paths are
identified and interprocess communication is manually performed via mes ages. In this
approach the programmer has complete control but the programming complexity and
debugging difficulties can be extreme. This method of algorithm design is also known as
the MultipleInstruction, MultipleData (MIMD) method.
While message passing eliminates the primary difficulty of using SMPs for high
performance computing, it also eliminates the primary advantage  shared memory space.
For this reason, processors housed in separate cases with separate memory and storage
spaces may just as well be used. This approach results in slower communication of data
between processors, but with careful algorithm design, this disadvantage is overshadowed
by the advantages.
Such a collection of separate computers is called a distributed computing
environment, or a Beowulf cluster. This is potentially much less expensive than a high
performance SMP especially if the designer follows a COTS approach  thi simply
7
refers to the use of inexpensive and readily available component, uch a 686 lnt I
processors and other components typically used in common home and bu ine PCs.
It's possible to have over one hundred such PCs (or nodes) in a clu ter. At a co t
of a few to several hundred dollars per node, a 128 node cluster, for the right application,
would have a much better cost/performance ratio as compared to a SMP with the arne
number of processors. Also, a cluster is easily expandable wherea an SMP i not.
Many issues come into play for such a large cluster. Fast message passing and
robustness is crucial. There is a great deal of research going on in the areas of hardware,
operating system and network configuration, and algorithm design. While these issues in
and of themselves command the full attention of computer scientists, motivated
researchers in other fields may, with work, develop enough basic knowledge to reap the
benefits of the distributed computing concept for their simulation work.
1.3 The Modified Embedded Atom Method (MEAM)
When implementing MD simulations, the choice of potential model i allimportant.
There are many potential models which have been developed over the years,
primarily by physical and organic chemists  Morse, LennardJones, Embedded Atom
Method, Modified Embedded Atom Method, Tersoff, and Brenner, just to name a very
few. The majority of available potential models are defined for only a few materials.
Others may be in a parameterized form, and can be applied to many different materials,
8
but have been demonstrated to give poor re ults in some case, such as the Mor e
potential with BCC materials [3].
The Morse potential has been the basi for much of the re earch carried out by Dr.
Raff and Dr. Komanduri to study nanometric cutting, tribological proce se, and
nanomaterials properties. The Morse potential has the following advantages and
disadvantages:
• As a simple pair potential, it is easy to implement and computationally much less
intensive than other potential models.
• It is parameterized, and can model many FCC transition metals by variation of a
few parameters.
• The Morse potential, and pairpotentials in general, give vacancyformation energy
to be the same as the cohesive energy, a major error for transition metal [4].
• The elastic properties of simulated materials are poorly represented due to th lack
of volumedependence in the model [5].
• The Morse potential does not prescribe parameters for interactions between
different materials, so special consideration must be given to tool/workpiece
interaction.
• Because of these facts, and the recent development and refinement of more
sophisticated potential models, the Morse potential has come under increasing
attack by members of the research community.
9
The Modified Embedded Atom Method, or MEAM, overcomes the di advantage
of pair potentials. Based on density functional theory and the quasiatom theory, it
includes many body effects and accounts for angular components of bonding. It is al 0
parameterized, and can model many FCC, BCC, and HCP transition metals, alkali metaJs,
and even silicon, diamond, and germanium. Fundamental propertie of matter are welJmodeled
with the MEAM, such as vacancy formation energy, sublimation energy, elastic
properties, surface potential effects, and crack and dislocation effects [6]. The MEAM
even specifically provides for interactions between unlike materials.
For these reasons, the MEAM has the potential to allow comprehensi ve and
comparative modeling of many different materials and for workpiece/tool interactions.
The current work focuses, in part, on the application of the MEAM to indentation of
various materials. Results are compared to available experimental and theoretical values,
and to previous work with the Morse potential.
In conjunction with the above mentioned advantages, the MEAM has one major
disadvantage  it is computationally much more demanding than pair potentials. As a
result, the MEAM is an ideal candidate for parallel processing. While the parallel
processing techniques developed in this study are applicable to any potential model, they
were developed and tested using the MEAM the model indentation.
10
t
The computationally intensive nature of the MEAM potential model has prompted
additional investigation into methods of decreasing simulation time. One such technique
discussed in this work is the use of the conjugate gradient method for cry tal relaxation.
II
CHAPTER 2
LITERATURE REVIEW
2.1 Parallel Molecular Dynamics Modeling
Many architectural enhancements and parallel processing philosophies have been
developed over the years. Each of these methods is intended to circumvent the fact that
computer hardware is fundamentally clockrate limited and that distribution of the
computing workload over multiple processors can potentially offer a huge efficiency
boost. Work in this area can be divided into roughly three hardwaredesign categorie :
transputer arrays, symmetricmultiprocessor (SMP) supercomputers, and distributed
clusters. Each of these areas require a very different approach to the design of efficient
and effective algorithms. Each specific type of architecture may also be best suited for
particular applications.
Transputer arrays gained popularity throughout the 1980s and into the early
1990s. Transputers are an example of singleinstructionmultipledata (SIMD)
parallelization at the hardware level. Multiple simple processors (often only I bit) are
interconnected on the same circuit board. Each of these processors has its own RAM,
12
apart from the RAM of other proces or . Parallelization at the data level i accomplished
by spreading out the algorithm over these processors.
Transputers were applied to molecular dynamics with the realization that vector
processing yields limited speed increases. The first proposed implementation of
transputers to MD was by Slaets and Travieso [7] in 1989. It this paper, an MD
algorithm was designed which would apply well to a transputer. Use of the linkedcell
method without neighbor lists was suggested. Each proces or would perform the
computation for one cell. A ring interconnection between processors was suggested.
Performance analysis indicated that this algorithm on T800 transputer would
exceed that of a Cray XMP for larger systems of atoms. It is interesting to note,
however, that these researchers did not implement their proposed algorithm. This was
due to the fact that the T800 transputer had an insufficient amount of memory for MD
simulation, and that the JowIevel programming necessary on a transputer for a uch a
complex problem as MDis exceedingly tedious.
Later, in 1995, Woods and Alford [8] implemented MD on an array of transputer
nodes configured as a scaleable torus. By interconnecting several transputers, a multipleinstruction
multipledata (MIMD) approach was possible. In this paper, a highly detailed
performance analysis was given to determine optimum hardware configuration, algorithm
design, interconnection topologies, and communication procedures. A great deal of
13

insight was derived into the efficient and scaleable implementation of MD model on
networked transputers.
While there are still some adherents today to this technology, for the mo t part,
transputers have been left by the wayside. The main problem with parallel algorithm
development for transputers is the lack of programming tools. In most cases, a custom
transputer language, OCCAM, is used. Use of this language is tedious, as lowlevel
subdi vision of the algorithm must be done manually for optimum performance.
Researchers have generally preferred parallel architectures which allow the use of higher
level languages, such as Fortran or C.
A second type of parallel architecture in which MD has been implemented is
traditional vector processing supercomputers. To take advantage of vector processing,
careful modification of the MD algorithm is necessary. This was first demonstrated by
Heyes and Smith [9] in 1987, and carried out by Rapaport (10] in 1988.
Complications arise from the fact that when using the linked cell (LC) method.
the typical vector length is of the order of the cell occupancy list, and is too hort for
efficient vectorization. Rapaport devised the layered linked cell (LLC) method, whereby
larger flat cells are used to span the simulation space. In this layered link cell algorithm,
the inner loops are completely vectorizable and of a length on the order of the number of
cells. This LLC method, when carefully optimized, was the fastest algorithm for vector
supercomputers reported in the literature as of the late 1980s.
14

In ] 994, Everaers and Kremer [11) made an improvement on the LLC method. A
fine grid is superimposed over the layered linked cells, fine enough such that the e small
cells are occupied by at most one particle. This obviate the need to distingui h between
inter and intracell interactions, and allows for longer loops, which are more ea ily
vectorizable. Additional complexity is imposed by the necessity to search multiple cells
for neighbor interactions, but with careful implementation, Everaers and Kremer reported
a three times speedup as compared to the simple LLC method for uniformly di tributed
particles. With density variation, the amount of speedup decreases, but wa still
significant.
Despite these efforts, it is impossible to escape the fact that higherlevel MIMDstyle
parallelization is much better suited for efficient MD simulation. MIMDstyle
parallelization for MD u ually involves spatial decompo ition  that i , each proce sari
a signed to a particular region of space and handles all computation for atom in this
region of space.
An early example of such an approach is given by the work of Liem et. al. [] 2] in
1991. While a transputer array was used in this work, the focus was to develop an
algorithm which would work on any distributed memory type of cluster. In this work, a
method of using one processor per linked celJ was described. The benefits and
requirements of this method were enumerated. These included the need for each
processor to communicate only with the processors which represent contiguous cells, and
J5
the need to do particle reallocation, in the event that particle move into a subregion
associated with another processor.
This approach has been the basis for all subsequent MIMDstyle parallel shortrange
MD algorithms. Later work has focused on load balancing, portability, and fa t
message passing. Srinivasan et. al. [13,14J reported on an intere ting project in 1997 to
address the lack of portability suffered by most parallel MD implementations. A runtime
library, called Adhara, was written specifically for parallel MD simulation over
distributed nodes. Written in C, able to be compiled on any system, it provided
functionality for atom definition, partitioning and distribution over processors, and other
features to allow programmers to take advantage of the parallelism inherent in MD. This
library depended, however, on an SMPtype of processor arrangement.
Portable parallel programming, with the capacity for load balancing and fault
tolerance, and with an emphasis on fast message passing, is implemented most widely
today using the Parallel Virtual Machine (PVM) or Message Pa sing Interface (MPI)
standard libraries. PVM was developed ny AI Geist et. al. [15] at the Oak Ridge National
Lab in 1993 [15]. MPI was first developed in 1994 by the MPI Forum [16,17,18]. Each
of these is a set of libraries which may be compiled and used on a variety of architecture
types. With these highlevel libraries, portable and efficient parallel program are easier
to develop. A free and highly portable implementation of MPI called MPICH was used
as the basis for parallel algorithm development in this study.
16

2.2 MD Modeling of NanoIndentation
Alder and Wainwright [19,20], of the Lawrence Radiation Laboratory, conducted
the first MD simulation studies in the late 1950s. They initiated numerical simulation of
many atom or many molecule systems because of the great mathematical difficulty
involved in analytical treatment. Alder and Wainwright recognized the potential of MD
simulation to generate more detailed information as compared to actual experimentation.
Also, MD offered the possibility of studying problems for which present theory had
difficulties addressing, such as the behavior of a pure liquid, phase transactions, or the
nucleation problem.
It is interesting to note that, due to the limitations of computers at the time of
Alder's and Wainwright's pioneering work, it was necessary to used a vastly simplified
potential model, the squarewell potential. Also, the models used were limited to 500
atoms  even then, onehalf hour of processing time was required with an average of only
one collision per atom using a Univac or ffiM 704 computer. These early MD models
were found to accurately describe some equilibrium properties of the simulated hard
spheres.
Cutting and indentation was first addressed by Belak et. al. [21] at the Lawrence
Livermore National Laboratory in the late 1980s and early 1990s. Using a SPRINT 64transputer,
they were able to run million atom MD simulations of nonequilibrium
indentation. The LennardJones potential in conjunction with the Embedded Atom
17

Method (BAM) was used to simulate twodimensional crystals of copper and nickel.
Microhardness and yield strength were found to be between 10% to 25% of the hear
modulus, which is in reasonable agreement with the theoretical values given in Hertzberg
[22].
In 1993, Belak et. at. [23] also investigated nanoindentation of copper and silver.
In these simulations, indentation of (I 1 1) surfaces with a triangular diamond tip were
performed. Initial attraction between the indenter and the work material occurred, and
elasticplastic deformation of the substrate was observed. Critical yielding with a
corresponding decrease in loading force was observed after penetration by the indenter to
a depth of about 1.5 atomic layers. With greater indentation depth, pile up of atoms
around the tool occurred.
In 1990, Landman et. al. [24] published experimental and simulation re ults on
the indentation of a gold surface by a nickel indenter, using atomic force microscopy
(AFM) and MD. They reported that, as the indenter approaches the gold surface, atomicscale
instability occurs causing a jumptocontact (JC) phenomenon. At contact,
adhesioninduced flow was observed, followed by plastic yielding and lip in the urface
region of the gold substrate. Withdrawal of the indenter resulted in wetting of the nickel
tip by gold atoms, formation of an atomically thin connective neck, followed by fracture
of the neck. AFM analysis could not confirm many of these features predicted by the
simulation.
18

Landman, Leudtke, and Ringer [25] expanded the above inve tigation in 1991 to
include interionic (CaF2) and thin alkane films adsorbed onto a gold urface. The e
molecules were represented by "pseudoatoms" using the EAM potential. Tip liding
was also performed, to simulate the mechanism by which AFM is conducted. 0 cillatory
variation of force as a function of lateral displacement was observed. Thi wa attributed
to atomicscale stickslip behavior.
Rentsch el. al. [26] used the LennardJones potential in 1994 to model the
indentation and scratching of copper by a diamond indenter. The purpose of thi study
was to simulate the action of a single abrasive grain during UPG. Chip formation and
pile up was observed. The scope of this study was limited by the computational demand
of MD  to scratch for long enough lengths to observe more of the chip formation
processes required unreasonably long simulation times.
Indentation of silicon by a diamond indenter was investigated by Brenner et. al.
[27] in 1996. Using the Tersoff potential for silicon, and the LennardJones potential at
the interface, pure elastic deformation was observed in the substrate with an indentation
depth up to 0.6 nm. Further indentation produced irreversible damage to the substrate,
with the profile of the damage normal to the surface matching the indenter profile, and
circular deformation on the inplane profile.
In 1998, Yan and Komvopoulos [28] used the LennardJones potential to model
FCC crystal substrates with a single atom or with rigid indenters at various temperatures.
19

Jumptocontact phenomena during indentation were ob erved and the normal force on
the indenter varied in a sawtoothlike pattern. Material differences had trong influence
on the resulting forcedistance curves, and elastic stiffness and yield trength were found
to decrease with increasing temperature.
In 2000, Komanduri et. al. [29J used the Morse potential to investigate anisotropic
effects during the indentation and scratching of single crystal aluminum. Measured
hardness was found to increase with a decreasing indentation depth. Variation in
measured hardness and friction coefficient values resulted from variation of the cry tal
orientation. Also, a method was presented to quantify and graphically display the amount
of disorder at various positions in the machined substrate throughout the indentation
process.
2.3 Development of the MEAM
In 1972, R. A. Johnson [30] addre sed the inadequacies of pair potentials when
addressing point defects in metals, including vacancie and ubstitutional and inter titial
impurity atoms. The LennardJones, Morse, and BornMayer functional forms were
investigated, and functional parameters were derived. These simple twobody potential
were found to yield poor agreement with vacancy properties. Johnson made the
observation that the only way to obtain physically realistic results is with a short ranged
empirical type of potential.
20
Specifically, Johnson suggested that a new approach to the development of
potential models was needed. The commonly used pair potential, LennardJone ,
Morse, and BornMayer, were based on the behavior of only a very few atoms. These
potential models were then assumed to apply to larger systems of atoms. The empirical
approach taken by Johnson involved pure curve fitting to attain agreement with known
physical properties of the material in question.
The concept of the "quasiatom" was introduced by Scott and Zeremba [31] in
1980. Using densityfunctional theory, a method of estimating the energy of an impurity
in a host lattice of atoms was presented. The entire impurity atom, the nucleus and
electron cloud, was treated as a single unit, or "quasiatom". The focus of the quasiatom
theory was on the effect of the host on the embedded quasiatom, not on the effect on the
host lattice. The potential energy of an embedded impurity was considered to be a
function of the electron density contributed by the host at the site of the impurity. This
electron density wa calculated without the impurity in place.
This was the key issue  without the impurity, the electron density of the
otherwise pure lattice can, for many materials, be e timated with density functional
theory. With this new method, the authors were able to reproduce correct qualitative
trends for bUlh hydrogen and helium, which are chemically very different.
21
Puska et. a1. [32] expanded thi work in 1981. Energie for atoms of hydrog n
through argon in a homogenou electron gas were determined u ing the den ityfunction
scheme. Comparisons to the work of Stott and Zeremba were favorable.
Daw and Baskes [33] introduced the Embedded Atom Method (EAM) in 1984.
The quasiatom concept was borrowed and applied to any atom in a ho t lattice. The
electron densitydependent term was augmented by a pairwise term to represent the coreto
core repulsion between atoms. These pairwise and density term were derived for
nickel and palladium empirically using properties such as elastic constants, heats 01
solution, and migration energy. These methods were applied to calculate surface energies
for various crystal orientations, and hydrogen migration in bulk Ni and Pd and on Ni and
Pd surfaces with good results.
In 1986, Foile , Daw, and Baskes [34J published a consi tent et of EAM
embedding and pairwise functions for Cu, Ag, Au, Ni, Pd, and Pt. Equations were
presented in a parameterized form, which permitted the imulation of each of these FCC
materials with the same basic model. Good agreement with experimental values for each
material was reported when calculating migration energy of vacancies, formation energy,
surface energies, and other properties.
Baskes [35] extended the EAM to model silicon, a covalent material, in 1987. It
was found that a simple firstneighbor EAM model was sufficient to de cribe the
geometry and structure of many metastable phases of silicon, but could not describe it
22

shear behavior. Thi was due to the fact that the EAM model spherically average the
electron density contributions from each atom. Directional component of bonding,
important for covalent materials such as silicon, are not addressed by the EAM model.
To compensate for this, Baskes adjusted the EAM to account for bondbending force.
Baskes' model reproduced the bulk lattice constant, ublimation energy, and bulk
modulus of diamondcubic silicon to within 5% of experiment.
Johnson and Oh [36,37] applied the EAM to HCP metals in 1988 and BCC metals
in ]989. However, the most significant extension to the BAM came from Baskes et. al.
[38] in 1989 with the Modified Embedded Atom Method, or MEAM. In this model, the
angular components of bonding were formalized. A set of functions were presented
which were applicable to silicon and germanium, and their alloys. Comparison of the
predictions of this model to firstprinciple calculations and experiment were favorable for
the calculation of the energetics and geometries of point defects, urface, metastable
structures, and small clusters.
The MEAM was further extended and formalized by Baskes [39] to a et of
parameterized equations for silicon, germanium, diamond, many BCC and FCC
transitions metals, and some alkali metals. Baskes cautioned that this extension was
empirical and was not justified by strong physical arguments, as had been the BAM and
many pairwise potentials. Despite this caution, Baskes confirmed that his model could
describe the elastic behavior and simple defect properties of these diverse materials, as
well as bulk structural and surface propertie. Also, with this MEAM model, it was
23

sufficient to consider firstneare t neighbors only, ince energy difference w re
accounted for by angular terms In the electron density function, not by long rang
interactions.
This model, as set forth by Baske ,wa used a the basis for thi inve tigation
into nanometric indentation.
24

CHAPTER 3
PROBLEM STATEMENT
Three basic aspects of molecular dynamics simulation are addre ed in thi tudy
with the intent of offering improvement in each area. These three objectives, the focus of
this study, are as follows:
Objective 1: It is necessary to have a computing environment and algorithm
design which can perform the necessary calculations in a reasonable amount of time for a
rea onably large system of atoms. While "reasonable" is a relative term, it is defined
here to mean a few to several thousand atoms in no more than a day or two. Also, while
a certain level of performance may be acceptable, additional speed is always desirable.
As a particular MD algorithm becomes faster, more simulations may be run in the same
amount of time, or larger systems of atoms may be studied. The desire to achieve
additional computational speed always exists.
For this reason, the present study includes the development of a MIMDstyle
parallel algorithm to decrease simulation time. This is especially neces ary for the
MEAM because its mathematical complexity makes it slower as compared to
25

pairwise potentials. Also, since this i an entirely new area of work for thi re earch
group, the methods used and discoveries made in this investigation will provide guidance
for the future expansion of the distributed computing cluster. AI 0, the e technique are
by no means limited to the MEAM potential  future parallelization of the Morse
potential, based on the techniques developed in this study, would allow the modeling of
many more atoms that would otherwise be possible. The software written in support of
this research is modular and may be extended to other potential models without great
difficulty.
Objective 2: When conducting MD simulation, an accurate potential model is
desired which, for the physical processes and materials to be simulated, accurately
describes the behavior of the atoms in the system. This study focuses, in part, on the
implementation of the MEAM to model nanometricscaJe indentation. The deficiencies
of the Morse potential, and pairwise potentials in general, have already been enumerated.
The MEAM will be developed and its suitability for nanoindentation will be evaluated.
The insight gained will also allow a sessment of the possible application of the MEAM to
the modeling of other manufacturing processes.
Objective 3: For successful MD simulation, a program is needed which correctly
implements the desired potential model, and passes validation tests, such as conservation
of energy and backintegration. Due to the mathematical complexity of the MEAM
potential, this step requires additional attention and care. This investigation will
document the accurate implementation of the MEAM. Verification of the model via
26

validation tests will be demonstrated, for interaction between both like and unlike
materials.
27

CHAPTER 4
THEORY OF MD SIMULATION
4.1 Introduction
This chapter will discuss the details and mathematics of molecular dynamics as
applied to manufacturing processes simulation. The discussion is divided into six broad
sections which include the selection of physical units used in the MD model, calculation
of interatomic forces, integration of the equations of motion, boundary and thermal
considerations, crystal relaxation, and algorithm design issues.
4.2 Units Used in the MD Model
The selection of physical units is an extremely important point when designing a
computerbased model of a physical proce s. Computers have a limited precision
available when performing floatingpoint arithmetic. This important fact can cause a
perfectly valid model to yield invalid results if the programmer does not take care.
Limited preci ion results in roundoff and truncation error when performing computerbased
calculations. For example, when adding numbers of various magnitudes, different
answers may result depending on the order in which the summations are made. While
28

some error is unavoidable, it can be minimized to the point of irrelevance with the careful
selection of units.
Units should be chosen such that the raw values used in the simulation do not
become too large or small. If all numbers are close to 1.0, plus or minu a few orders of
magnitude, rounding and truncation errors will generally be insignificant. This guiding
principle was used when selecting the following physical units for the MD simulation
model used in this study:
Table 4.1  Physical Units of the MD Model
r"  _._. _.  
j Fundamental
Unit Used I Parameter
II
length angstrom (A), 10. 10 m
mass
atomic mass unit (amu),
1.66x10·27 kg
energy
electron volt (ey),
160x10·1g J
time 1.02x10·14 s
/
The time unit, or time step is not selected  it is determined by the choices for
length, mass, and energy.
29

4.3 Calculation of Interatomic Forces
Successful trajectorybased molecular dynamics simulation ha a one ba ic
requirement the ability to calculate accurately and quickly all forces on all atom in the
system. The force on a single atom i i defined as the negative of the rate of change of
the total potential of the system as the position of atom i is varied, with all other atom
remaining stationary. All atoms k which bond to i must be taken into con ideration:
1'. =_" aV,otal r . J, £..J:'\ Untt
k"'; orik
(I)
A different but equi valent formulation can be deri ved by separate consideration of
the three orthogonal space dimensions:
f. =  , aVrota,
I.Z £..J"'I
bi OZik
(2)
(3)
(4)
While either formulation may be used, the second is more convenient in orne
cases. This is particularly true when forming analytical expres ion for forces for the
electron density terms of the MEAM.
30

Forces may be found via the above definitions u ing analytical or numerical
means. Analytical forms of the forces are necessary for accuracy and peed, while
numerical estimates of forces serve as a useful diagnostic tool. Numerical derivative
were determined in this study using a formula suggested by Raff [40] a being wellsuited
for MD force estimation:
(tVt   _ =O.75S, O.15S2 +O.016666S,
~ ~~ >
where:
SI =V(xo+&)V(xo&)
&
S2 =V(xo+2&)V(xo2&)
L\:t
Sl =V(xo+ 3&) V(xo  3&)
. ill
The parameter tu IS selected to gIve the greatest accuracy.
(5)
(6)
(7)
(8)
With infinite
floatingpoint precIsIOn, a smaller value for ill would alway give more accurate
estimates. However, with the 64bit double precision on the Digital Alpha C compiler
and processors used in this study, there is a lower limit below which accuracy suffers.
Determination of the optimum value of & was done through trial and error.
While these formulas for numerical estimates of forces are easily implemented,
the force values they generate have some inaccuracy built into them. Also, use of the e
31

formulas requires a great many evaluations of the potential of the ystem of atom which
is very time consuming. To ensure a valid simulation which execute in a reasonable
amount of time, analytical forms must be derived using Eqns. 14.
For the MEAM, this is a tedious and timecon uming task. The MEAM (a
discussed in Chapter 5) is mathematically complex, especially as compared to imple
pairwise potentials. To reduce the complexity of this task in this inve tigation, each
individual term In the MEAM potential expression was handled one by one, and
comparisons of analytical derivatives were made to numerical derivatives to verify
correctness at each stage.
4.4 Integration of the Equations of Motion
Given the tate of the MD model at a given point in time (i.e., the position and
momentums of all atoms in the system), and forces calculated as above, a new tate of the
molecular dynamics model may be determined at a more advanced point in time through
integration of Hamilton's equations of motion. For each atom, the following six coupled
firstorder differential equations must be solved:
dXi Pi .., = (9)
dt mj
dYi =Pi.) ( 10)
dt m,
dZ i = Pi.z (II)
dt mi
32

_d'p_..X=f
dt I,X
dp.
_/'_Y =f
dt I,y
_dp,_.,z=f
dt '.Z
(12)
(13)
( 14)
In these formulae, p represents momentum,jforce, m mass, and t time. Ln other
words. two coupled integrations over the three space dimension must be performed,
Force is integrated to yield an updated momentum and momentum is integrated to yield
an updated position. This processes is repeated to establish the trajectories of the atoms
in the MD model through time.
The integration method used in this study was the fourthorder RungeKutta
method. While other methods may also be used with success, the RungeKutta method is
selfstarting, stable, it yields good accuracy for rea onable time step" and it i not
difficult to implement.
4.5 Boundary and Thermal Considerations
Due to the computationally intensive nature of trajectorybased molecular
dynamics simulation, it is important to include in the simulation model a minimal number
of atoms. For the computing resources available in this study, an atom count of roughly
lOS atoms was the maximum system size possible in order for the simulation to complete
in a reasonable amount of time. However, even many orders of magnitude more atom
33

would still be wholly in ufficient to represent actual tools or ub trate. For thi rea on,
it is necessary to impose artificial conditions at the boundarie of the MD model to mimic
the effects of bulk material out ide the model.
Bulk material beyond the modeled portion of the substrate contribute two basic
effects: resistance to deformation and translation, and thermal influence . These effect
are simulated through the creation of three distinct atom types: moving atoms, peripheral
atoms, and boundary atoms as proposed by Riley et. al. [41] in 1988.
Moving atoms represent the majority of atoms in the MD model, and are
considered as "normal" atoms, with no special constraints or consideration imposed
upon them. Moving atoms are free to move under the influences of forces applied due to
neighboring workpiece and/or tool atoms.
Peripheral atoms are also free to move under the influence of applied forces.
Peripheral atoms are established in a layer with a thickness of one lattice con tant over
the surfaces of the simulation model where additional bulk material i assumed to exist.
To emulate heat transfer to the bulk workpiece, the velocities of the peripheral atom are
reset periodically according to:
(15)
That is, at some reset time tn' a new velocity for peripheral atom i is set to a
random value (as determined by ~) selected from a Boltzmann distribution at
34
I
temperature T. The random velocity is modified by the old velocity as determined by the
parameter W, which determine the strength of the reset. Following Riley, the frequency
of the reset is set to five times the Debye frequency of the lattice, and w is set to 0.1047 .
Typical machining processes generate significant amount of thermal energy near
the cutting, grinding, or indentation action. This thermal energy, manifested i.n the
kinetic energy of the moving and peripheral atoms in the system, will typically be
attenuated through the peripheral velocity reset function, simulating heat 10 to the bulk
workpiece.
The outermost edges of the MD model where bulk workpiece i assumed to exi t
are comprised of boundary atoms. Equations of motion are not integrated for boundary
atoms, and they are not allowed to move under the influence of applied forces. In fact,
forces on boundary atoms are usually not calculated to save time. Boundary atoms
simulate the resistance to deformation and translation which bulk material would provide.
In addition, when a workpiece or tool is moved to model some proces , such a
indentation, the boundary atoms are moved at the desired velocity. The forces they exert
on moving and peripheral atoms results in translation of the entire workpiece or tool.
An illustration of a typical arrangement of these three atoms types is provided in
Figure 4.1. All surfaces but one have these layers of peripheral and boundary atoms.
This single "free" surface is where indentation occurs.
35

Figure 4.1  Arrangement of Peripheral and Boundary Atoms
•D
D
moving atoms
peripheral atoms
boundary atoms
4.6 Crystal Relaxation
A pure crystal of any given transition metal at room temperature has a longrange
order (face centered cubic or body centered cubic) and a lattice spacing associated with it.
Within the bulk of a perfect cry tal, there is little or no deviation from this geometric
definition of atomic arrangement  near the edges of the crystal, however, deviation from
this ideal arrangement is necessary for the crystal to achieve a minimum potential state.
To reach this minimum potential state in an MD simulation, the workpiece lattice
IS relaxed prior to the start of the experiment. A widely used, effective, and easily
36

implemented method of relaxation is the damped trajectory method. With thi method,
effects of the too) are ignored and equations of motion for the workpiece atoms are
integrated, allowing the atoms to move in response to intracry tal forces. By this
process, atoms will naturally approach po itions which yield a local minimum in the
potential hypersurface  momentum of the atoms, however, will cau e the atoms to move
past their relaxed positions, increasing the total potential. To prevent this, when an
increase in the total potentia) occurs, momentums of atoms are set to zero. This process
is continually repeated, causing the system to oscillate about the minimum potential
configuration with a decreasing amplitude of oscillation. When the minimum has been
approached to a reasonable degree of accuracy, the process is terminated, and the crystal
is completely relaxed.
This damped trajectory method has been used exclusively by Dr. Raff, Dr.
Komanduri, and their research groups. It is the best choice for simple pairwise potential,
and is easily programmed. With the MEAM, however, the damped trajectory method
proved to be very slow, and an alternative method wa implemented. This method, the
conjugate gradient method. treats the relaxation process as a pure optimization problem its
use in this study is documented in Chapter 6.
4.7 Algorithm Design Issues
While accuracy of the molecular dynamics algorithm is the primary concern when
coding the model, efficiency is second. Knowledge of basic computer science and
37

computer engineering principles, and familiarity with the specifics of the computer
system being used allow the MD simulation programmer to write programs which run
faster than would otherwise be possible.
When designing a computer algorithm where performance is an issue, there are
two pairs of conflicting goals which must be balanced against each other:
• Redundant calculations may be avoided at the expense of increased memory
usage.
• Speed may be increased at the expense of program readability, simplicity, and
dchugging ease.
A few decades ago, there was less freedom when balancing the first pair of issue
due to the limited memory available on typical computers. A memory became cheaper
and smaller, bookkeeping techniques, such a the neighbor list and linked cell method,
became feasible.
Neighbor lists refers to the practice of determining which atoms are sufficiently
close to bond and storing in memory these pairs of bonded atoms. Typically, atoms are
considered to bond if there separation distance is less than some fixed distance, known as
the cutoff radius. This neighbor list is potentially very large, but the same Ii t may be
used throughout potential and force calculations for the same time step. Additionally,
neighbor lists may be used over multiple time steps if bonds are stored for atoms with a
separation distance larger than the cutoff radius. With this approach, it is assumed that
38

this augmented bond is valid for a few or more time steps, and that it require Ie s
frequent updating. Determining the extended cutoff radius and the number of time tep
for which it is "safe" to use this bond lists requires careful attention and typically depend
on the potential model used.
The simplest way to determine the bond list is to compute the square of the
distance between all possible pairs of atoms, and compare this di tance to the square of
the cutoff radius to see if a bond is formed (using squares of distances avoid quare
roots). This step takes an amount of time on the order of the number of atoms squared.
However, if the simulation space is subdivided into cubes, or cells, and if the
atoms are presorted within these cubes, then to determine bonds, atoms need to be
compared only to atoms in neighboring cells. This technique is called the linked cell
method  it is not difficult to implement, but determination of optimum parameters, such
as cell size and number of cells is not easily determined. Ala, with different izes of
MD models, the optimum choice of parameters changes. However, with well chosen
parameters, the time required to form a bond list with the linked cell method is close to
being on the order of the number of atoms to the first power.
These two techniques are standard in most modern and wellwritten MD
algorithms. They illustrate the need for compromise mentioned above: use of these
methods gives greater speed, but more memory is required, and their use complicates the
algorithm considerably, making it more difficult to debug, modify, and harder for others
to understand.
39

These features are incorporated into the MD aJgorithm implementing the MEAM
developed in this investigation. Additiona]]y, orne lowerlevel optimization were u ed
which reduced computation time by a few percent. With a imple pairwi e potential,
these effort would have a negligible effect  with the computationally intensive MEAM,
however, they were worth the effort. On a computer, floating point multiplicatIOn i
faster than division, so division was eliminated and/or minimized where pos ible. Loop
counters were specified to be stored in CPU registers where possible. The number of
variables used within a loop was minimized to allow, if possible, complete caching (that
is, storage of data in memory which is faster than RAM) of data used in a loop. La tly,
while extensive use of global variables is generally considered poor programming
technique, they are preferable when speed is an issue  passing values to and from
functions and procedures requires pushing/popping data from the stack, which requires
additional time.
40

CHAPTER 5
THE MODIFIED EMBEDDED ATOM METHOD (MEAM)
5.1 Introduction
The quasiatom principle in conjunction with density functional theory gave rise to
the Embedded Atom Method (EAM). The MEAM is an extension of the EAM where
angular bond dependence has been incorporated into the computation of electron density
at the site of any atom. Further details on the history and development of the MEAM can
be found in Chapter 2.
The MEAM as set forth by Baskes [6] is the ba is of thi study. This chapter will
describe the MEAM, how it was implemented in this study, and the methods used for
verifying the accuracy of the program.
It is important to note that Baskes warned about his extension of the MEAM  the
extension was empirical and has not been justified by strong physical arguments, as have
the EAM and pairwise potential methods. Baskes' purpose was not to derive optimum
parameters for each material considered. but rather to set up a framework for atomistic
calculations.
41

For the extension to binary systems, Baskes assumed that the partial electron
density weights depend only on the embedded atom type, and not on the type of atom
contributing the density. Also, electron densities are not scaled. Baskes warned that
these assumption are extreme, and recommended further investigation before
widespread application.
In this investigation, these warnings are kept in mind. Evaluation of the accuracy
and suitability of the MEAM is presented in Chapter 9.
5.2 Formulation of the MEAM for Interactions Between Like Materials
The total potential energy of a system of atoms using the EAM is given by:
v =I(FJpJ+ 2
1 I<Pik (rik ))' (16)
, b'l
This formula gIves the total energy as a sum over all atoms i. (Note that
subscripts denote material types as well as specific atoms). The potential energy for a
single atom i is a function of a linear superposition Pi of spherically averaged electron
densities at the site of atom i. In this initial formulation, the pairwise term, <Pik ' was
purely repulsive, and represented core to core electrostatic effects.
In the MEAM, the electron density at site i, Pi' is augmented by angular terms
and is renormalized by the number of nearest neighbors, Z;:
42

(17)
The pairwise term has taken a more general form in the MEAM. For interaction
between the same material (that is, atoms i and k are the same material):
(18)
The first term represents an average of the energy per atom of the reference lattice
,~'' ~ .~ ..  ~ .. " .. _...... ~. . .
at each of the nearestneighb.o.r..distances. ·The second term is formed by the differenc.e ~_ .. ~' .' ... '.
between the embedding energy at the .pa~tcgro\tnd.electron density actually seen by ~tom i
.~... ._.... ~._. .'. .' . _., ....... . .
and the average embedding energy atthts ..":.tglp inJhe reference lattice at each of the .  . ", .
nearestneighbor distances.
The total energy of a system of atoms of the same type using the MEAM is
therefore:
v = ~[F.(p~. )+_1~ g'(r. )__1~ F;(p;O(r;k)))
(Olal £.. ( Z. Z £.. ' It Z £.. Z
i'iJ('~i i k~i ;
The first partial background electron density at site i, 15;0, is given by:
15;0 =I p :(0) ('it)
."',
43
( 19)
(20)
The total electron den ity at site is defined in term of the fir t partial
background density (Eqn. 20) and three correction terms that explicitly depend upon the
relative positions of neighbors of atom i:
(21)
The correction terms are:
(22)
(23)
(24)
where XU = ri~ / and ri~ is the ex component of the distance vector between atoms k
Ik /rik
and i. The forms are chosen such that the partial background electron densities are
invariant to lattice translation and rotation, scale simply with atomic electron den ity for
homogeneous deformation, and equal zero for a cubic lattice with a center of symmetry.
The terms of the form p;U(/) (r) are given by:
where:
PiUUJ (r) =exp( b')
44
(25)

(26)
The energy to embed an atom at site i is a function of tbis den ity, and is given
by:
(27)
This form is unchanged from the EAM and has been shown previously [4] to give the
correct coordination dependence between bond length and energy.
Finally, expanding the pairwise term E;' :
(28)
where:
(29)
In these formlu a'tlons, the terms EOi' RO A {3(Q) {3(I) {3(2) {3(3) (0) j' u;' j' j' i';' i ' t; •
til), tj
(2) ,t;3), Z;' siO) , S;(I) , S;'2) , and S;'3) are all specified parameters for atoms i of a
gi yen material.
45
...
5.3 Formulation of the MEAM for Interactions Between Unlike Materials
In previous work [4], when formulating the EAM's application to alloy ystems
and impurities, the dilute heats of solution were used to scale the electron densities, and
unlikeatom pair potential parameters where determined u ing a geometric mean between
the likeatom potentials.
In the MEAM a different approach is used. Pairwise parameters are derived by
considering, notadiJut~ _.al.lQY ~t~1l! .bu~ _~!1__~quiato~~__.binary intermetallic alloy . " . .  .. '., '
system, where each i atom has only k neighbors, and vice versa. With this approach, the  ........ ." .. ~ .... .. ~~.  ...... ._ _.. ~~ .. 
following pair potential terms are derived for interactions between unlike materials:
(30)
o (£,0 + £~ J where Eik = 2  tJ. ik (tJ.'k is the heat of formation for equiatomic compound of
material i and k), a ik =(at ~ak), 2'k is the number of nearest neighbors to an i or k
atom, and Ri~ is calculated from the assumed equilibrium intermetallic atom volume.
No modification to the embedding energy term is necessary. However, care mu t
be taken with the specified subscripts,
46

In summary, with the above noted modification to the pairwi e term, Eqn. 17
also applies to unlike materials.
5.4 Angular Screening
In the application of the MEAM, Baskes reported "it was found that a
simplification to firstneighbor interactions for all crystal structures wa pos ible" [6].
This minimizes the number of bonds which must be considered, which increa e
simulation speed. However, it becomes necessary to define what a first nearest neighbor
is, especially in situations where crystal structure has been distorted.
Such a procedure must be continuous in the energy and its first two derivatives to
ensure that force discontinuities do not occur. One method is to institute a fixed cutoff
di tance, beyond which potential and force effects are ignored, or beyond which all
distancedependent functions are smoothly forced to zero. While this method is
sufficient for pairwise interactions, Baskes proposed implementing a "screening" method
in addition to a static cutoff.
The initial screenmg method proposed by Baskes [6] had some deficiencies
namely a discontinuity in the second derivative. Baskes [42] later developed an
alternative method which corrected these errors  this corrected version was used in this
study and is presented here.
47

A screening method should take into account the actual geometry of the atom
under consideration. A pair of atoms, i and k, may be fully screened, partially screened,
or unscreened depending on atoms which may be between i and k ( ee Figure 5.1). If an
atom j is inside the ellipse C =0.8 , atom j completely screens i and k, and i and k are
considered to have no interaction. If atom j is outside the ellipse C =2.8 , then i and k are
completely unscreened, and their potential effects are unmodified as given by Egn. 17.
Finally, if an atom j lie in the intermediate region (such as atom j in Figure 5.1), then
atoms i and k are partially screened.
C=2.8
C=O.8 CD
Figure 5.1  Illustration of Angular Screening Method
This screening method is formulated as follows. Equations of ellipses formed by
atoms i and k on the major axis, with parameter C, are:
48
paz
(31 )
where the x and y axes are in the plane defined by atoms i, j, and k. For a given atom j,
the parameter C is determined by:
Here, C"",x and Cmil! are the limiting values of C as shown in Figure 5. J. The smooth
atoms. A screening factor Sijk for any three atoms i, j, and k is defined by a smooth
where X;j ~ (XJand X j' = ( r;(J'and ris the distance between the denoted
(33)
(32)
(34)
x ~ I }
O<x<l
x$O
S f (
C  Cmin ] ijk  c Cmu  Cmin
f)x) =l~(i  x)' f
c = 2(X ij + XjJ (X ij  XjkY 1
l(Xij  XjkY
function of the parameter C defining the ellipse fonned by i, j, and k:
cutoff function is the following piecewise continuous function:
where the argument x is the argument defined in Egn. 3 t .
49

Finally, the screening factor between a pair of atom i and k i given by:
Sik = nS;jk
j~;.k
(35)
In other words, the screening between two atoms i and k is the product of all
partial screening factors formed by considering the screening effects of all other atoms j
in the system. If any atom j completely screens i and k, the product becomes zero, and
complete screening occurs. If no atoms screen i and k, all Sijk I S are 1, giving Sik =I ,
corresponding to no screening. Finally if partial screening occur without any complete
screening, Sik will be some value between zero and one.
This screening function is incorporated into the MEAM for like materials by
modifying the pairwise terms in Eqn. 19 as follows:
v ="(F(P~' )+ _1"S. E" (r ) _1"S F;(p;O(rik ))J (36)
lotal ~ I Z. Z ~ Ik I Ik Z ~ Ik Z
iiik;f4; ; krt.i i
Also, the screening function affects the electron density through modification of
the partial electron densities (Egns. 2024):
(37)
(38)
(39)
50

Modification of the MEAM of unlike materials is done in a similar fa hion.
5.5 Validation of the Implementation
(40)
The MD program implementing the MEAM was tested using the following two
rigorous tests:
• Conservation of energy test
• Backintegration test
The can ervation of energy test determines if the analytical forms of the
interatomic forces derived according to Eqn. I are correct. If energy is can erved, the
urn of potential and kinetic energy of the system of atoms should remain constant to
within a reasonable degree as determined by the accuracy of the integration of equation
of motion.
For a system of fourteen copper atoms, energy was conserved as shown in Figure
5.2. The same constancy is seen for any material and any size ystem.
51
50 60 70 80 90 100
Potential Energy
Total Energy
Kinetic Energy
10 20 30 40
60
40
20
 0 >Q)
 20 > .C..) Q) 40 s::
W
·60
·80
·100
120
0
....
Time Step
Figure 5.2  Conservation of Energy for Cu System
For interactions between unlike materials, the MEAM formulates different
equations. Because this necessitates the use of code within the program separate from the
code tested above, it is also nece ary to test for conservation of energy when different
materials interact. For a system of seven silver atoms and seven iron atom , refer to
Figure 5.3.
52

50
Kinetic Energy
0
S
a> >
t»
~
a> 50
c: w Total Energy
100
10 20 30 40 50 60 70 80 90 100
150 ~,r........,..rrr,,",1
o
Time Step
Figure 5.3  Conservation of Energy for AgFe System
A rigorous test of the integration function may be done using the backintegration
test. If the integration method is properly coded, using a negative time step will simulate
the system of atoms backwards in time. The backintegration test involves integrating the
equations of motion for a system of atoms normally for some period of time, followed by
reversing the integration direction. A properly working integration function will be able
reproduce the same motion of the atoms, along with the same potential and kinetic energy
values. This should occur for many time steps until normal integration error forces the
system to another path. Figures 5.4 and 5.5 illustrate backintegration tests for the same
systems used in the conservation of energy tests using an integration time step of 0.1.
53
40 50 60 70 80 90 100
Total Ener
Kinetic Energy
20 30
_____~P_o_tentiaJ Energy ~_..._
10
60
40
20
 0 >C1I
 ·20 >
C'l
~
C1I ·40
l:
W
·60
80
·100
120
0
Time Step
Figure 5.4  Backintegration for eu System
40 50 60 10 80 90 100
Time Step
54
30
Figure 5.5  Backintegration for AgFe System
10 20
·150 +.,.,r..........,r~
o
50
Kinetic Energy
 0 >C1I >
C'l
~
C1I ·50
l: w
Total Ener
100
....
As can be seen, the MD model backintegTates very well. Only a light thickening
of the lines demonstrates the slight deviance which results from the small amount of
unavoidable integration error.
55
t..
It
CHAPTER 6
RELAXATION WITH THE CONJUGATE GRADIENT METHOD
6.1 Introduction
In molecular dynamics simulation, a computer program which correctly
implements the desired potential model, which computes correct forces on all atoms in
the system, and which properly integrates the equations of motion, is essential. However,
a functional program which runs very slow is almost as useless as a nonfunctional
program.
This issue became evident with the completion of a working MD simulation code
implementing the MEAM. Before largescale application of the code to problems of
interest, such as indentation, significantly greater speed was needed. While use of
neighbor lists, the linked cell method. and various lowlevel optimizations (as discu ed
in Chapter 4) helped, much more improvement was necessary.
As a first step in addressing this issue, it was necessary to identify the bottleneck
in the MEAM program. The important functions in the code were timed for various
56
..
(rtf
II
t
1000 1200
forces (MEAM)
800
..... \ .. .b~:~;~~~~~M
forces (Morse)
potential (Morse)
600
Number of Atoms
200 400
1000
100
 en 10
"Cc0u
CI)
en Gl
E 0.1 i=
c::
::::s
a:: 0.01
0.001
0.0001
a
Digital Alpha 500au workstation and compari on are made to the Mor e potential.
system sizes as shown in Figure 6. I. Reported run times are for a ingle proce or
Figure 6.1  Run Times of Significant Functions in the MEAM and Mor e Code
As can be seen, the MEAM is slower that the Morse potential. With the Morse
potential, the formulation is quite simple, so calculation of potential or forces will be
dominated by the function which determines the bond list. Also with the Morse potential,
there is no large difference between the times required to calculated forces and potential.
With the MEAM, however, additional overhead is imposed due to the need to
calculate screening factors. More significant, however, is the most time con uming
function, the forces function. Due to the complex form of the potential, and screening
57
, eMrI

factors, a great many complex terms appear in the analytical form for the forces.
Calculation of these many telm is quite time consuming.
While parallel processmg can be applied with great ucces, as di cus ed in
Chapter 7, a less drastic first measure is possible, namely the implementation of a new
method for relaxation  the conjugate gradient method. This new technique take
advantage of the large run time disparity between the forces and potential functions with
theMEAM.
6.2 Optimization of the Relaxation Process
As discussed in Chapter 4, the damped trajectory method is an effective method
for relaxation. It is easily implemented, since it reuses existing code. As discussed in
Chapter 4, equation of motion are integrated and atoms are forced by the laws of physics
toward a minimum potential configuration. Convergence is assured through timely
cancellation of momentum.
The relaxation process is not required to conserve energy. A a result, a large
integration time step may be used as integration error is not a key issue. Also, the mas es
of atoms may be temporarily set to small values so they will accelerate more quickly
under the action of applied forces.
58

The damped trajectory method, by integrating equation of motion, make many
calls to the forces function. This is fine for pairwise potentials, where calculation of
forces is no more time consuming than calculation of potential. For the MEAM,
however, calculation of forces is much more time consuming than calculation of
potential. For this reason, a relaxation method which can make more efficient use of
gradients of the potential hypersurface would be well suited for the MEAM. The
conjugate gradient technique is such a method.
6.3 The Conjugate Gradient Method
The conjugate gradient method is a wellestablished optimization technique. The
form of the conjugate gradient method used in this study is based on the formulation by
Hagan and Demuth [43] for the training of neural networks.
The conjugate gradient method involves the determination of a search direction
based upon the local gradient of the potential hypersurface. An interval i found which
contains a local minimum along this search direction. This interval is then reduced to a
specified tolerance. These steps are repeated until the change in potential from the
previous step falls below another specified tolerance. With this method, forces are
calculated once per step to determine the search direction  the interval location and
reduction steps involve potential function calls only.
59
/
An overview of the conjugate gradient algorithm i as follow :
) Set initial search direction to be the negative of the gradient, Po = go'
2) Locate an interval in the current search direction containing a local minimum.
3) Reduce this interval to a specified tolerance using a Golden Section Search.
4) If the change in the potential from the previous tep is below a specified
tolerance, quit.
5) Set new search direction as follows:
a) If the number of iterations exceeds some specified number, reset the
algorithm by setting the new search direction to the new gradient.
b) Otherwise, determine the new search direction according to:
6) Go to step 2.
The potential function is called repeatedly during each tep in order to locate and
minimize the current interval, but for each step, the gradient (forces) need to be
calculated only once. For the MEAM potential (and presumably for any potential model
where forces are significantly more computationally expensive the potential), this speeds
up the relaxation process considerably.
Figure 6.2 shows a comparison of the two relaxation methods. For the damped
trajectory routine, the usual techniques are used to speed up convergence  atomic
weights are set to 1.0, and a large time step of 0.25 is used. The algorithm is repeated
60
.,.....
until 30 potential increases occur, which has been found to be uffici nt to reach a
minimum to almost all the available digit of precision for smaller time steps (~ 0.05 ).
Tolerances for the conjugate gradient method are 101 and the algorithm is reset every
five iterations. The frequency of the algorithm reset wa determined heuri tically to give
the quickest convergence for the tests summarized in Figure 6.2. As before, run time are
for a single processor Digital Alpha 500au workstation. Also, in every case tested, the
two methods converged to the same atom arrangement.
Damped Trajectory
Conjugate Gradient
45000
40000 0
(1) 35000
Cf) >< cu 30000
Qi
a:
0 25000 'C
(1)
L. 20000
:;:,
D'"
aG:l 15000
Gl
E 10000 t=
5000
0
0 100 200 300 400 500 600 700 800 900 1000
Number of Atoms
Figure 6.2  Comparison of the Damped Trajectory and Conjugate Gradient Methods
As can be seen, the conjugate gradient method can converge to the desired
minimum much more efficiently than the damped trajectory method for the MEAM. For
1000 atoms, the time required to relax has been reduced by a factor of approximately
mne. Similar improvement is to be expected for any potential model where
61
.... I
determination of forces on atoms
potential of the ystem of atoms.
significantly lower than determination of the
62
.,.
.,...
CHAPTER 7
PARALLEL ALGORITHM DESIGN FOR MD
7.1 Introduction
The history and development of the application of parallel processmg to
molecular dynamics simulation has been discussed in Chapter 2. As was discussed, there
are three main schools of thought in this area, each of which is best suited for certain
hardware and certain applications  vector processing, datalevel singlein tructionmultiple
data (SIMD) processing, and procedurelevel multipleinstructionmultipledata
(MIMD) processing. Even though each of these has been applied to molecular dynamics,
MIMD methods offer the greatest potential for accelerated simulation times.
This fact is both fortunate and unfortunate. It is fortunate because a parallel
processing environment which is wellsuited for MIMD is inexpensive to build. It is
unfortunate because programming for and building such an environment is difficult.
However, if one is willing to invest the time and has the required patience, the rewards
are well worth the effort, as this chapter will demonstrate.
63
M'
7.2 The Parallel Computing Environment
Parallel processing is an entirely new endeavor for our re earch group. In order to
test this new approach while minimizing cost, existing resources were reconfigured to
support both parallel and sequential processing. Figure 7.1 illustrates this new computing
environment.
Wide area network and internet
I Router I
I Switch I
Slave
node
Master
node and
NFS server Slave
node
Monitor,
keyboard, rmouse
KVM
switch
'jf+,I I
lupsi
Figure 7.1  Configuration of the Parallel Processing Environment
Three Digital Alpha 500au workstations were clustered with a 100 Mbps Ethernet
switch. Connection of the master node to the wideareanetwork and internet was made
64

through a router for security reason. Similarly, an encrypted terminal protocol known as
secure shell (ssh) was used for remote login instead of telnet. Due to the e ecurity
considerations, a lowlevel unsecure protocol, remote shell (rsh) was able to be u ed for
interprocess communication. A partition on the hard drive of the rna ter node was hared
among the slave nodes via networked file system (NFS). The operating ystem u ed was
the Redhat distribution of Linux, a free Unix clone. Power to the node was provided
through a uninteruptable power supply (UPS) to avoid problems due to power los . Also,
a single monitor, keyboard, and mouse was shared among the three computers with a
keyboardvideomouse (KVM) switch.
A free implementation of the Message Passing Interface (MPI) standard, known
as MPICH, was used. MPICH is a set of libraries which provide the names, calling
sequences, and results of subroutines to be called from C, C++. or Fortran programs to
exchange messages between processes. MPICH also provides functionality to start.
abort, and manage multiple proce es on multiple machines, and variou tool for te ting
and performance analysis. These testing procedures were used to assure that the cluster
was in working order.
7.3 Design and Performance of the Parallel Algorithm
As discussed in Chapter 2. the optimum MIMDstyle parallel domain
decomposition method for pairwise potentialbased molecular dynamics (and the nbody
problem in general) is welldocumented in the literature. While no specific discussion of
, I
••
domain decomposition techniques for nonpairwi e potential (e.g. the MEAM) was
found in the literature, the cases are similar enough for nbody techniques to be effective.
As compared to pairwise potentials, the MEAM additionally requires propagating
electron density values.
Briefly summarizing the discussion in Chapter 2, the most efficient approach for
MIMDstyle algorithm design is to use neighbor lists and the linked cell method. Ideally,
each cell is assigned to its own node within the cluster. If enough nodes are not
available, multiple contiguous nodes may be alternatively be assigned to a single node.
This dynamic domain decomposition technique, when properly implemented, will result
in the fastest possible computation. However, proper implementation is difficult  load
balancing must be considered (density variation in the system may overload certain
nodes), and the topology of the model must be carefully established to minimize
communication requirements. Also, algorithms running on nodes which represent cells at
the boundaries or corners of the simulation space must be specialcased.
The entire purpose of a sophisticated domain decomposition technique such as
described above is to minimize message passing requirements, the slowest component of
a typical distributed cluster. With dynamic domain decomposition, proce ses need only
communicate with other processes which are topological neighbors. Alltoall messaging
(situations where data on one process is needed by all other processes) is slow. The
dynamic domain decomposition method avoids this bottleneck, but it has an additional
tacit requirement to ensure its success  a reasonably large number of nodes. A
66
.,. ,
.,....
considerable amount of "overhead" is incurred with the distributed linked cell method
and with load balancing routines. That is, a considerable amount of computation i done
that is not directly related to the essence of molecular dynamic imulation, but i
intended to facilitate and speed up the essential function of the program by minimizing
message passing. When the number of nodes in the distributed computing c1u ter i
small, this overhead dominates, and renders the dynamic decomposition method
ineffectual. With a small number of nodes, the simulation space cannot be efficiently
subdivided.
The distributed computing cluster used in this tudy has only three nodes. This is
far below the number in a typical cluster  most clusters used for highperformance
computing have as a minimum 32 nodes, and typically have 128 or more. As a result, the
conventional approaches, including the dynamic domain decomposition method, require
rethinking.
In this study, a static domain decomposition technique was used. This technique
is mentioned only briefly in the literature because it requires alltoall communication,
and is therefore a very poor choice for implementation on large clusters. On a small,
three node cluster, it is the best choice. With only three nodes, the simulation space
cannot be divided to make a process independent of other proces es, and alltoall
communication cannot be avoided.
67
r
I,
The tatic domain decompo ition technique assign a fixed et of atom to each
process, as opposed to a signment of a fixed region of pace (a with dynamic
techniques). Because atoms may be arranged in any unpredictable arrangement, atoms
on a given process may bond with atoms on any other proce s. For this reason, alltoall
communication is necessary. However, with only three nodes in the c1u ter, each node i
sufficiently busy that the nodes, typically, will not be "starved for data", a typical
situation with alltoall communication in large clusters.
The static domain decomposition technique requires that each node integrates
equations of motion, compute bond lists, potentials, forces, screening factors, etc., for
only its own set of atoms. Bonds to any atom in the system may occur  therefore, each
process needs to know the current coordinates of and electron densities at each atom in
the system.
Figures 7.2 and 7.3 show, in a coarse sense, process flow of the root and slave
algorithms. The master (or root) process performs most of the initialization, tarts
processes on the slave node, and propagates aU necessary data to the slaves. After the
initialization stage, all nodes (root and slaves) operate more or less on an equal basis,
each doing all necessary computation for its own subset of atoms, and exchanging
coordinates and densities when necessary. Throughout the simulation, the root proces
performs all console and file I/O. While MPI and NFS will route any slavebased console
or file I/O back to the root process, this is accomplished at the cost of additional network
activity  therefore, this is avoided.
68
r
• Initialization
Initialize MPI, tart proce se on slave nodes
Proces command line parameters
Process specified input file
Allocate memory
Create workpiece and tool
Load MEAM parameters from input file
Propagate parameters and atom coordinate to slave processes
Determine an even division of atoms and assign a subset to each proce es
• Relaxation (conjugate gradient)
Write prerelax atom coordinates to a file
Determine bond list (intraworkpiece bonds only, local atoms only)
Determine screening factors (local atoms only)
Compute potential (local atoms only)
Assemble total system potential at root
Set initial search direction for local atoms to the negative of the gradient
Locate an interval containing a minimum (using parallel potential computation as above)
Reduce the interval, converging to the minimum to within a specified tolerance
If the new minimum is sufficiently close to the previous value. relaxation is complete
Determine new search direction for local atom ,resetting to the local gradient after every five
iterations
Write postrelax atom coordinates to a file
End relax
• Main Simulation
Insert thermal energy into lattice (local atoms only)
Determine bond list (all workpiece and tool atoms. local atoms only)
Determine creening factors (local atoms only)
Integrate equations of motion (locallll.Oms only)
Propagate new alom coordinate to all processes
Determine bond list (all workpiece and tool atoms, local atoms only)
Determine screening faclOr (local atoms only)
(Optional) Compute kinetic and potential energies of system to check energy conservation
Write atom coordinates at userspecified interval
Reset local peripheral atom momentums if necessary
If local process has tool atoms, move them according to specified indentation speed
Repeat until simulation is complete
Finalize MPI. kill slave processes. free memory
Figure 7.2  Process Flow of the Root Algorithm
69
,.
t
t
tn·

•
•
Initialization
Initialize MPI
Allocate memory
Receive parameters and atom coordinate from root
Receive atom assignment from root
Relaxation (conju~ate ~radient)
Determine bond Ii t (intraworkpiece bonds only, local atoms only)
Determine screening factors (local atoms only)
Compute potential (local atoms only)
Report local potential to root
Set initial search direction for local atom to the negative of the gradient
Locate an interval containing a minimum (using parallel potential computation a above)
Reduce the interval, converging to the minimum to within a specified tolerance
If the new minimum is sufficiently close to the previous value, rellUation i complete
Determine new search direction. resetting to the local gradient after every five iterations
End relax
Main Simulation
Insert thermal energy into lallice (local atoms only)
Determine bond list (all workpiece and tool atoms, local atoms only)
Determine screening factors (local atoms only)
Integrate equations of motion (local atoms only)
Propagate new atom coordinates to all processes
Determine bond list (all workpiece and tool atoms, local atoms only)
Determine screening factors (local atoms only)
Reset local peripheral atom momentums if necessary
If local process has tool atom. , move them according to specified indentation speed
Repeat until simulation is complete
Finalize MPI, free memory
• End
Figure 7.3  Process Flow of the Slave Algorithm
Parallel programming for a distributed computing cluster is tedious. There is no
integrated developing environment or debuggers for messagepassing style parallel
programmmg. In addition to the normal potential for bugs that any complex program
has, dividing the problem among separate processes each with mutual dependence has
many additional potential pitfalls. It is difficult to view data which is stored in a slave
process for debugging purposes. When a problem is encountered, the programmer is
70

frequently forced to manually log and review program output and intermediate
calculation results. Computer scientist researchers in the field of high performance
cluster computing are working to develop new and better tools to facilitate parallel
application development, but for now, things must be done the hard way.
Figure 7.4 illustrates the total elapsed time required for the sequential and parallel
algorithms for various system sizes. Measured times are fur systems of iron atoms, and
simulation time are 500 times steps with an integration re olution of 0.] time steps. The
sequential algorithm is run on the master node (with no spawning of slave processe ) to
allow meaningful comparison.
7000
6000
...
.'Q.".I 5000 =' C ·s ''
QI 4000
S
Eo:
Q 3000 0 ..C':l :::l
S 2000
r;)
lOOO
0
0 1000
Sequential Algorithm
2000
Number of Atoms
3000 4000
Figure 7.4  Comparison of Run Times for Parallel and Sequential Algorithms
71
The time savings accomplished with the parallel algorithm are ignificant. For a
system of 4000 atoms, the sequential algorithm takes almost 4.5 day  with the parallel
algorithm, just over 2 days, a speed increase factor of roughly 2.25. With three nodes of
equal capacity doing the work, a factor of 3 is the theoretical maximum performance
gain. This is not realizable because of the additional time required to pass message
among the processes. However, this theoretical maximum can be approached, as in this
case, with efficient and intelligent algorithm design.
72
,..
CHAPTER 8
MD MODELING OF NANOMETRIC INDENTATION WITH THE MEAM
8.1 Introduction
The purpose of the work summarized in the preceding chapters was to make it
possible to apply the Modified Embedded Atom Method to molecular dynamics
modeling of nanoindentation, and achieve results in a reasonable amount of time. In the
absence of these developments, namely the use of the conjugate gradient method and
parallel processing, a typical molecular dynamics simulation involving a few to several
thousand atoms would take over a week of computer time with the MEAM, using
existing computing resources. With these new techniques, less than two days are
required.
While this is significant, it is of little use if the resulting simulation do not model
reality. While this judgement is frequently not clearcut due the difficulty involved with
experimental investigations on the nanometric scale, there are validation techniques
available. The use of some of these techniques is summarized in this chapter in order to
assess the usefulness of the MEAM for molecular dynamics simulation of
nanoindentation. While the indentation process is definitely of interest in and of itself,
73
the successes and/or failures of the MEAM for indentation can erve a test case, and
guide application of the MEAM to other manufacturing proce es.
The first method for judging the validity of the MD simulations conducted in thi
study is viewing of animations of the simulations. General trends and behavior may be
observed in this way and evaluated using common sense, materials cience experti e, and
by comparing to results in the literature. This method is used here, and snap hots of
animations of the MD simulations are included to illustrate the points made.
Another method used is measurement of the theoretical strength of the indented
crystals. By measuring the average force on the indenter from initial penetration until
immediately before retraction, and dividing by either the contact or projected area of the
indenter, the strength of the crystal can be computed. These values can then be compared
to the theoretical strength of dislocationfree crystals, given by Hertzberg [22] a :
strength = %;r
where G is the shear modulus of the material.
8.2 Indentation with the MEAM at the Interface
(41)
In his extension of the MEAM [39], Baskes formulates two models, one for pure
crystals and one for alloys. With his parametric formulation for binary systems,
interactions between many different materials was described. Because one of the
74
deficiencies of MD modeling of manufacturing processes is the inability to accurately
model interactions between unlike materials, it was hoped that the MEAM could
accomplish this. This section describes the exten ion of the MEAM model of alloy to
interactions between physically separate pure crystals of different materials.
Many combinations of tool and workpiece materials were tried and the majority
were found unsuitable. To illustrate why, refer to Figure 8.1 which show the indentation
of iron with diamond, and Figure 8.2 which shows the indentation of iron with tungsten.
75
,.
..•• I . ..,.... ~.p: .. ~
..'" '.
.' .
. " . • ' .', .. I " ,. " ••
. .••• • .. • fI ..:.•.. ~ ' .""._.
;'~" ~ " ".t,·.·. · • ~ ••. ~ ~ • 6 • • • • • .. .~.. .. .. ... .. ',_ ' . .. .• .. .. t ••  .. . 0" .• ,... ." II * , •
~.... " .. , ,,~•.•• ., • * , .
.. ... .. ..... .. .. • • • f • '" , • ,. ••
(II " ." .... • •••
•·•4•·~.. ··'#·· .... ~ ... ,., .. •• tt., • '.. .......  .' ..... ~ .,. . •
....... • .• .If .. " •• , ••
• .. ~ .. ~ t ... , .. It ••• .....4!..... fp .,. f":; ~ • ~ • #: ••. ,,,~  . ,.". • " .. II •
..... , 011' • , .. , ,
, " •• 011' • ~
.... 0 .. 0 , • ,. ..... r ... ,
....... ~0II' ••
•• '011", ~, '
, ... .r , ,.
1 • , ,I • " ,
,~,., •• t~ ••
" ... " •• II .. ,. •
flo ' • 1 " ., " •• 011' ....... "., .'. ...( ..... , .. .. ... ... .. .'" .. ,. .... " .... ;.:«.;o:~·
.. , • , .. It
.'".....'" ..... '" ......'" '" '"'""" "•.•.'".•.•.•.p.• • ·.. ........'". ~ ........,,,.. "' '" '" '" '". r.'I".",.I.·..<t'. ...,
'1 .. 4 ...... " " •••••' •••••••• '':".: ;.:~.~:.«~4.1..... '_f•.. fI...r. .......e...... •••:.'1 '.. ,'.~4! 1Il •••••• " .. If P •••• '" '" ••• •,,:.~~:.:~:.: : 4 ~•••••••••4
_.
'... , ..... '''' '" " '" .. '" '" '" . ".'.·.~.·4".·" .. II • " .
"f •.•,,/' .'"."..jI " . . "' .. '" if •••• ·... '" .'" ....'" . '" '" .." .. ·.... . ... .. " '" ..'" " '"'" • '" ,. p • • • • .. .'" ....'" ... .I".•.••..•, ' ,
Figure 8.1  Indentation of Fe with Diamond using MEAM at the Interface
76
......... ~ .....,. , , .
•
•
•
•
• '" to .
..'.". ,. '•", (. .t.il ,' ' , ... ,. (. . " .
f"l (0 " • .. .. .. • .. • ..
(. f· {.. •.••• 0
f· fl (. {. .. .. .. .. .. .. ..
f' (. ('. . . . • • . 0 t· f. t' fO ..
.. fA • It· ('. • . . • . I • 0 •
• • • e. ~~~~~'c (' .. " . ........ . ~ .":"A. f'.' ."', , .
• • .. • • O'••: • to • • .. I • I , .... . ,..... , .,. . . . . . . .
@I ••••••• • '"1 ~~~.,.f ...... ,0.'. ". "." 0
• .. ... " .. ~, f. .. • .. • .. I • • • •• .,;. fl· I· • • • • • • • , .. .... .it:<:. ''''" (. ," • • '.' .. .. .' ,,'.( ' .
4',( "'( ,•., ,I' t .••
.J.ot .. " ,.. .. • .. • " • .. (. . . ,. (. " , , ..
, '•••••••• 0 t r. , .
• I' " • • • '" • 0 0 .. ~
" · .. ., . . .. .. . .. . . • .. 0 4 •• 4 ..... ·(.. .. . .. . .. .. . ..
f· .
It ,.. ... • • • • .. .. • •
(. (••••••• 4 ...
(. ,.. (' '. .. .. • • .. .. t
(. (. I' •• 0 •••• 0
~ r ~ ~ ~ • .. • .. .. ..
• • • '.(.f.'·(.' ..... , .0 .. ".0, .
• • • C' f' •. 4 • 4 0
• • ,. h ~ P I' .. .. .. .. • • • .•..to· .. ·.•·"f·(·(.l·,. .
• • , •• t· (. ,........•
 ",.~~(,f',,".'." .•.•. ".O'.' I
• • •• (. (t (. ". .. • " • ., .. •
......... (·('r/·'/· . '. '. '. '., '.' .. '.'
(' (, (•••• 4 •• 0 •
I' (. ,.. ,O' " .. .. • .. • ..
(. (. (. " ... 4 .....
r' P ~ • • •  • • • •
(' f' •• 0 • I ......
~ (. ... .. ..  .... (. f' '•••••• '" ...
• p .. .. .. • • • • • •
I' • • .. I • .. .. .. .. .. ·............."......."......
Figure 8.2  Indentation of Fe with Tungsten using MEAM at the Interface
77
..
Near room temperature, these combinations of material are chemicall y
incompatible  that is, there should be little or no interaction. Indentation of the sam
workpiece by an infinitely hard indenter (this is the case here, ince the tool i not
allowed to deform), should give the same result regardless of the material of the indenter,
as long as the materials used are chemically nonreactive at low temperatures.
When indenting iron with diamond, a strong longdistance repulsion exists which
results in a crater larger than the tool. Interaction occurs at a considerable distance. Due
to this strong repulsion, the calculated hardness of the work materials is two orders of
magnitude too great.
When indenting with tungsten, the opposite situation occurs. A strong attraction
occurs which results in too low of a calculated hardness. The work atoms jump into
contact with the tool, and adhere to the tool after retraction. Negligible pile up occur
and a small crater is formed due to the fact that the attraction between tool ,and workpiece
pulls the work atoms back into place as the tool is retracted.
These two extreme cases, either strong repulsion or strong attraction, occurred
with the majority of combinations of materials used. While the e interactions mayor
may not be accurate, it can be concluded that if workpiece effect are of primary interest,
the binary formulation of the MEAM for work/tool interaction is inappropriate.
However, application to alloys may present opportunities not available with other
potential models, and work In
78
this direction IS recommended.
8.3 Indentation with Morse at the Interface
By USIng the MEAM at the interface between the indenting tool and the
workpiece, it became clear that a chemically neutral potential interaction between the tool
and workpiece is desirable. In order to focus on the behavior of the workpiece as
modeled by the MEAM, the Morse potential was implemented at the interface. As
described in previous chapters, this has been the traditional approach when con idering
effects between unlike materials. Morse parameters were used to give an insignificant
attraction between atoms, and only a shortrange repulsion. In essence, thi Morse model
causes atoms to interact in a fashion similar to how macroscale bodies interact  there is
no interaction beyond a simple "shoving" action when bodies contact each other.
This modification to the program was straightforward, due to the simplicity of the
Morse potential. Also, a slight speed increase results. While intraworkpiece bond
dominate, the worktool bonds which do develop take an in ignificant amount of
processing time with the Morse potential as compared with the MEAM.
Table 8.1 summarizes computed strengths of some pure crystals and how how
they compare to theoretical values.
79
Table 8.1  Comparison of MD and Theoretical Strengths for Various Material
Material
Theoretical Strength from
Strength (GPa) MD (GPa)
FCC
Nickel 33.4 28.2
Copper 19.1 20.7
Silver 12.6 11.6
Bee
Iron 31.8 12.3
Molybdenum 54.1 15.0
Chromium 18.4 4.5
The strength values from MD experiments are based on contact area. For the
FCC matL:rials, the MD values are reasonable, but for the BeC materials, the values are
considerably too low. This trend holds for other FCC and Bee materials.
Figures 8.3 and 8.4 show snapshots of the simulations for copper and iron. A
large amount of elastic deformation and recovery is apparent with iron, and little pile up
occurs. With copper, much less elastic deformation is apparent, and a larger crater and
more pile up results. Also with copper, an excessive amount of ubsurface deformation
occurs. Deformation to this extent does not seem reasonable. These trends are found
with other Fee and Bee materials as well.
80
Figure 8.3  Indentation of eu using Morse at the Interface
81
Figure 8.4  Indentation of Fe using Morse at the Interface
82
CHAPTER 9
CONCLUSIONS
9.1 Summary of Results
Understanding of the mechanisms of material removal at sma.ll depths of cut, as is
typical in ultra precision machining and grinding, is limited by technological difficulties
in measuring forces and observing the chip formation process. Molecular dynamics
(MD) simulation has the potential to sidestep these difficulties and provide insight into
these issues in an inexpensive, reliable, and repeatable manner. However, even with
advanced computing resources, MD simulation requires a great deal of computational
time, especially when sophisticated potential model are used, such a the Modified
Embedded Atom Method (MEAM).
The Morse potential, while easily implemented and relatively fast, has a number
of deficiencies, primarily the inability to accurately model BCC materials. The MEAM,
as developed by Baskes [39], offers the potential to avoid these deficiencies, and allow
the modeling of a variety of FCC, BeC, HCP, and diamond cubic materials. The MEAM
also accounts for interactions between unlike materials.
83
To reduce the procesSIng time required by molecular dynamics imulation, a
multipleinstructionmultipledata (MIMD) style parallel proces ing algorithm has been
designed and implemented. A static domain decomposition technique was used, which
assigned a fixed set of atoms to each process in the cluster. All software development
was done in such a manner as to allow easy extension to any potential model. Since
application was on a three node cluster, alltoall message passing was u ed  the
standard methods of dynamic domain decomposition using process topology and load
balancing is not effective with such a small cluster.
To test this new algorithm, computer programs to implement the MEAM have
been developed and validated. Two versions have been developed  a sequential version
and a parallel version. Both give identical results, confirming the validity of the parallel
algorithm. The parallel version offers roughly a 2.25 times speed increase when applied
to the MEAM using a three node distributed computing cluster consisting of Digital
Alpha 500au workstations as compared to the sequential version running on a ingle
workstation of the same type.
For the relaxation process, the conjugate gradient method has been found to
operate much more quickly than an optimized damped trajectory method with the
MEAM. This is due to the fact that the complexity of the MEAM causes computation of
intera~omic forces to be much more time consuming than computation of the total
potential of the system. The conjugate gradient method requires infrequent evaluation of
the local gradient of the potential hypersurface at the expense of addition determinations
84
of system potential. This minimizes the need to call the forces function in the MEAM
code, which is the most time consuming function.
In conjunction, these two timesaving techniques reduced simulation time from
one week to less than two days for a typical MD experiment using the MEAM. With thi
usable simulation code, the MEAM was applied to indentation of pure cry talline
materials at the nanometric level. Use of the MEAM formulation for interactions
between unlike materials at the interface between the tool and workpiece resulted in
unpredictable behavior which dominated the action in the workpiece. Regardles of the
accuracy or inaccuracy of these interactions between unlike materials, it became
immediately obvious that continued used of a chemically neutral form of the Morse
potential at the interface was best for the study of nanoindentation and, most likely,
manufacturing processes simulations in general.
Using Morse at the interface with the MEAM used within the workpiece,
reasonable microhardness values were obtained for FCC materials. For Bee material ,
hardness values were too low. Significant crater formation and pile up were observed
with FCC metals, as was extensive and unrealistic subsurface deformation. With Bee
materials, deformation was largely elastic, and elastic recovery prevented significant
crater formation or pile up.
The superiority of the MEAM over the Morse potential has not been
demonstrated. It seems that the deficiencies of the Morse potential and the strengths of
85
the MEAM are not an issue for situations of interest to the manufacturing proces es
engineer. Indentation, cutting, grinding, and other material removal proce es result in
extreme deviation from the equilibrium lattice configuration and involve the input of
huge amounts of thermal energy into the system of atoms. The embedded atom method
family of molecular dynamics potential models are built on, in part, pure curve fitting to
equilibrium conditions. Testing and validation methods of EAM methods in the literature
do not involve such catastrophic deformation of the simulated material and energy input
such as occurs during the typical material removal process. It is suggested that these are
the reasons for the less than stellar performance of the MEAM.
9.2 Future Work
Application of the parallel processing techniques developed in this study to other
potential models is recommended. With the construction of a larger distributed
computing cluster, parallel implementation of the Morse potential would allow
simulation of very large systems of atoms. This would allow investigation into a whole
new class of problems, and could be accomplished with a minimum of effort. The
MIMD parallel algorithm developed in this study has been written in a modular fashion,
to allow easy extension to other potential models and other manufacturing processes,
such as cutting or grinding.
Use of the conjugate gradient technique for relaxation is also recommended for
any mathematically complex potential model. For any potential where computation of
86
forces is significantly slower than computation of potential, the conjugate gradient
method offers the potential for very large savings in computer time.
More work is recommended to reach a firm conclusion on the uitability of the
MEAM for MD simulation of indentation and manufacturing proce es in general. The
mixed results found in this study make accurate commentary difficult. While it has been
demonstrated that the binary formulation is not suitable for interactions between
physica.lly separate crystals, it may be useful for alloys.
87
REFERENCES
[1] Chadrupatla, T. R., Belegundu, A. D., Introduction to Finite Elements in
Engineering, 1997, Prentice Hall, New Jersey.
[2] Chandrasekaran, N., Length Restricted Molecular Dynamics (LRMD) Simulation of
Nanometric Cutting, 1997.
[3] Komanduri, R., Chandrasekaran, N., Raff, L. M., "Molecular Dynamics (MD)
Simulation of Uniaxial Tension".
[4] Daw, M.S., Baskes, M. I., "EmbeddedAtom Method: Derivation and Application to
Impurities, Surfaces, and other Defects in Metals", 1983, Phys. Rev. B., 29 12, pp
6443.
[5] Adams, J. B., Foiles, S. M., "Development of an EmbeddedAtom Potential for a
BCC Metal: Vanadium", 1989, Phys. Rev. B., 416 pp 3316.
[6] Baskes, M. I., "Modified EmbeddedAtom Potentials for Cubic Materials and
Impurities", 1991, Phys. Rev. B., 46 5, pp 2727.
[7] Siaets, F. W., Travieso, G., "Parallel Computing: A Case Study", 1989, Compo
Phys. Comm., 56, pp 63.
[8] Woods, M. B., Alford, C. 0., "Scaleable Parallel Model Development and
Performance Analysis for MIMD Molecular Dynamics Simulations", 1995, Compo
Elect. Engng., 21,3, pp 159.
[9] Heyes, D. M., Smith, W., Inf. Q. Comput. Simulation Condensed Phases, 1987,26,
68.
[10] Rapaport, D. c., Compo Phy~. Comm., 1988,9, 1.
[I I] Everaers, R., Kremer, K., "A Fast Grid Search Algorithm for Molecular Dynamcics
Simulations with ShortRange Interactions", 1994, Compo Phys. Comm., 81, pp. 19.
[12] Liem, S. Y., Brown, D., Clarke, J. H. R., "Molecular Dynamics Simulations on
Distributed Memory Machines", 1991, Compo Phys. Comm, 67, pp 261.
8H
[13] Srinivasan, S. G., Ashok, I., Jonsson, H., Kalonji, G., Zahorjan, J., "Parallel ShortRange
Molecular Dynamics Using the Adhara Runtime System", 1997, Compo Phys.
Comm., 102, pp 28.
[14] Srinivasan, S. G., Ashok, I., Jonsson, H., Kalonji, G., Zahorjan, J., "Dynamic
Domain Decomposition Parallel Molecular Dynamics", 1997, Compo Phys. Comm.,
102, pp 44.
[15]Geist, A, Beguelin, A, Dongarra, J., Jiang, W., Manchek, R., Sunderam, V., PVMA
User's Guide and Tutorial for Networked Parallel Computing, 1997, Fourth
Edition, MIT Press.
[16] Gropp, W., Lusk, E, Skjellum, A., Using MPI: Portable Parallel Programming with
the Message Passing Interface, 1999, Second Edition, MIT Press.
[17] Snir, M., Otto, S., HussLederman, S., Walker, D., Dongarra, J., MPI  The
Complete Reference: Volume 1, The MPI Core, 1999, Second Edition, MIT Press.
[18] Gropp, W., HussLederman, S., Lumsdaine, A., Lusk, E., Nitzberg, B., Saphir, W.,
Snir, M. MPI  The Complete Reference: Volume 2, The MPI Extensions, 1999,
First Edition, MIT Press.7
[19] Alder, B., Wainwright, T., "Studies in Molecular Dynamics I. General Method",
1959, J. Chern Phys., 31, pp 459.
[20] Alder, B., Wainwright, T., "Studies in Molecular Dynamics II. Behavior of a Small
Number of Elastic Spheres", 1960,1. Chern Phys., 33, pp 1439.
[21] Hoover, W. G., De Groot, A J. Hoover, C. G., Stowers, I. F., Kawai, T., Holian, B.
L., Boku, T., lhara, S., Belak, J., "Large Scale ElasticPlastic Indentation
Simulations Via NonEquilibrium Molecular Dynamics", 1990, Phys. Rev. A, 42, pp
5844.
[22] Hertzberg, R. W., Deformation and Fracture Mechanics of Engineering Materials,
1996, Fourth Edition, John Wiley & Sons, Inc.
[23] Belak, 1., Boercker, D. B., Stowers, I. F., "Simulation of Nanometer Scale
Deformation of Metallic and Ceramic Surfaces", 1993, MRS Bull., 18, pp 55.
[24] Landman, U., Luedtke, W. D., Burnham, N. A., and Colton, R. J., "Atomistic
Mechanisms and Dynamics of Adhesion, Nanoindentation, and Fracture", 1990,
Science, 24, pp 454.
[25] Landman, U., Luedtke, W. D., and Ringer, E. M., "Atomistic Mechanisms of
Adhesive Contact Formation and Interfacial Processes", 1991, Wear, 153, pp 3.
[26] Rentsch, R., lnasaki, I., "Molecular Dynamics Simulation for Abrasive Proce
1994, Annals CIRP, 43, pp 327.
[27] Brenner, D. W., Sinnott, S. B., Harrison, J. A, Shenderova, O. A., "Simulated
Engineering of Nanostructures", 1996, Nanotechnology, 7, pp 161.
e ",
[28] Yan, W., Komvopoulos, K., "ThreeDimensional Molecular Dynamics Analysis of
AtomicScale Indentation", 1998, J. of Trib., 120, pp 385.
[29] Komanduri, R., Chandrasekaran, N., Raff, L. M., "MD Simulation of Indentation and
Scratching of Single Crystal Aluminum", 2000, Wear, 240, pp 113.
[30] Johnson, R. A, "Empirical Potentials and Their Use in the Calculation of Energies of
Point Defects in Metals", 1973, J. Phys. F., 3, pp 295.
[31] Stott, M. 1., Zaremba, E., "Quasiatoms: An Approach to Atoms in Nonuniform
Electronic Systems", 1980, Phys. Rev. B., 22, pp 1564.
[32] Puska, M. J., Nieminen, R. M., Manninen, M., "Atoms Embedded in an Electron
Gas: Immersion Energies", 1981, Phys. Rev. B., 24, pp 3037.
[33] Daw, M. S., Baskes, M. I., "Embedded Atom Method: Derivation and Application
to Impurities, Surfaces, and Other Defects in Metals", 1984, Phys. Rev. B., 29, pp
6443.
[34] Foiles, S. M., Baskes, M. I., Daw, M. S., "Embedded Atom Method Functions for
the FCC Metals Cu, Ag, Au, Ni, Pd, Pt, and Their Alloys", 1986, Phy . Rev. B., 33,
pp7983.
[35] Baskes, M. I., "Application of the Embedded Atom Method to Covalent Materials: a
Semiempirical Potential for Silicon", 1987, Phys. Rev. Let., 59, 23, pp 2666.
[36] Oh, D. 1., Johnson, R. A., "Simple Embedded Atom Method Model for FCC and
HCP Metals", 1988, J. Mater. Res., 3, 3, pp 471.
[37] Johnson, R. A, Oh, D. 1., "Analytic Embedded Atom Method Model for BCC
Metals'''' 1. Mater. Res., 4,5, pp 1195.
[38] Baskes, M. l., Nelson, J. S., Wright, A F., "Semiempirical Modified Embedded
Atom Potentials for Silicon and Germanium", 1989, Phys. Rev. B., 40, pp 6085.
[39] Baskes, M. 1., "Modified Embedded Atom Potentials for Cubic Materials and
Impurities", 1991, Phys. Rev. B., 46, pp 2727.
[40] Raff, L. M., Molecular Dynamics Modeling, Lecture Notes.
90
[41] Riley, M. E., Coltrin, M. E., Diestler, D. J., 1988, J. of Chern. Phys., 88, pp 5943.
[42] Baskes, M. I., "Determination of Modified Embedded Atom Method Parameters for
Nickel", 1997, Mat. Chern. Phys., 50, pp 152.
[43] Hagan, M. T., Demuth, H. B., Neural Network Design, 1996, First Edition, PWS
Publishing Company.
91
VITA
David L. Stokes
Candidate for the Degree of
Master of Science
Thesis: PARALLEL PROCESSING OF MOLECULAR DYNAMICS SIMULATION
IN A DISTRIBUTED COMPUTING ENVIRONMENT AND APPLICATION
TO THE MODIFIED EMBEDDED ATOM METHOD AND
NANOINDENTATION
Major Field: Mechanical Engineering
Biographical:
Education: Graduated from Booker T. Washington High School in 1992;
received Bachelor of Science degree in Mechanical Engineering from
Oklahoma State University in December 1998; completed the requirements
for the Master of Science degree in Mechanical Engineering at Oklahoma
State University in December 2000.
Experience: Research Assistant for Dr. Ranga Komanduri at Oklahoma State
University from May 1993  December 2000.