
  SPhot - Monte Carlo Transport Code

*Privacy & Legal Notice* <http://www.llnl.gov/disclaimer.html>
------------------------------------------------------------------------


    Contents

    * Code Description 
    * Files in this Distribution 
    * Building the Code 
    * Running the Code 
    * Timing Issues 
    * Expected Results  

------------------------------------------------------------------------


    Code Description


      A. General Description

SPhot is a 2D photon transport code. Photons are born in hot matter and
tracked through a spherical domain that is cylindrically symmetric on a
logically rectilinear, 2D mesh. Many application codes at LLNL implement
some form of this basic algorithm. The version implemented here exploits
both MPI and OpenMP parallelism. The code is written in Fortran77.

Monte Carlo transport solves the Boltzmann transport equation by
directly mimicking the behavior of photons as they are born in hot
matter, move through and scatter in different materials, are absorbed or
escape from the problem domain.

The logically rectilinear, 2D mesh in which particles are tracked, is
internally generated. The mesh is small enough that a complete copy of
the mesh will not only fit on each node in a parallel machine, but also
fit into cache memory in most modern CPUs. Thus, this benchmark does not
stress memory access.

Particles are born with an energy and direction that are determined by
using random numbers to sample from appropriate distributions. Likewise,
scattering and absorption are also modeled by randomly sampling cross
sections. The random number generator used is implemented in the code
using integer arithmetic in a way that makes the resulting pseudo-random
number sequence portably reproducible across different machines.

The use of random numbers makes the code's output (edit variables)
"noisy." This noise is a direct result of the discrete nature of the
simulation. The level of the noise can be reduced by increasing the
number of particles that are used in the simulation. Unfortunately, the
level of noise in the answer decreases only very slowly with increasing
computational effort. The noise is inversely proportional to the square
root of the number of particles. If the noise is to be reduced to 1% of
the value in a given simulation, it is necessary to run 10,000 times as
many particles. Thus, high-quality (low-noise) simulations can become
very computationally expensive. Parallelism is an obvious way to
increase the number of particles.


      B. Coding

With the exception of a single routine (rdinput.f) that includes
Fortran90 derived type usage, SPhot is coded entirely in Fortran77.
Parallelism is implemented through the use of both MPI and OpenMP
libraries.


      C. Parallelization

SPhot falls into the category of "embarrassingly parallel" applications.
Every CPU that is employed in the computation works on a local copy of
the 2D mesh (most likely stored in cache), generates its own random
numbers and performs its own particle trackings. For the most part,
tasks do not exchange data with each other. The minimal communications
that do occur take place between the "master" MPI task and all other MPI
tasks for the purposes of distributing input data, updating global
variables and collecting timing statistics. Additionally, several
synchronization constructs are necessitated at both the MPI and OpenMP
level.

Parallelism in SPhot is implemented as a mixed-mode/hybrid model. It is
required that both MPI and OpenMP be used during execution, however
there is much flexibility for the user to choose how many MPI tasks and
OpenMP threads are used. In the most general sense, the MPI parallelism
is designed to provide communication between separate, distributed
memory machines, which may or may not be SMPs. Each machine hosts an MPI
task that oversees the OpenMP threaded execution of SPhot on that
machine, and facilitates required communications with MPI tasks on other
machines. However, having only one MPI task per machine is entirely up
to the user. Multiple MPI tasks can be run on a machine if this is
desired. Or, if the machine architecture employs global address space,
only a single MPI task may be required for the entire execution.

The OpenMP parallelism is strictly "on-node" (shared memory) execution
for SMP machines. Every OpenMP thread is "owned" by a single MPI task
with a ratio of /n:1/, where /n/ varies with the user-specified number
of OpenMP tasks. As with the number of MPI tasks, there is no explicit
specification within SPhot for the number of OpenMP threads which must
be used. Instead, these are specified by the user through the use of
environment variables or command line flags at run time. In this manner,
the same code can adapt to a diverse set of architectures and conditions.

------------------------------------------------------------------------


    Files in This Distribution

The following files are included in this distribution of SPhot:

      Subdirectory 	Files 	Comments
        	Makefile 	Main makefile for SPhot
      opac.txt 	Required opacity library
      input.dat 	Input file
      bin 	(none) 	Initially empty - used to keep objects after
      compilations
      includes 	geomz.inc
      globals.inc
      params.inc
      pranf.inc
      randseed.inc
      shared.inc
      times.inc 	Required include files
      src 	Makefile.src
      allocdyn.f
      copyglob.f
      copypriv.f
      copyseed.f
      execute.f
      genmesh.f
      genstats.f
      genxsec.f
      interp.f
      iranfeven.f
      iranfodd.f
      plnkut.f
      pranf.f
      ranf.f
      ranfatok.f
      ranfk.f
      ranfkbinary.f
      ranfmodmult.f
      rans.f
      rdinput.f
      rdopac.f
      second.f
      seedranf.f
      sphot.f
      thom.f
      writeout.f
      wroutput.f
      zonevols.f 	All Fortran source files and required makefile

------------------------------------------------------------------------


    Building the Code

Building SPhot is fairly simple and straightforward:

   1. *Perform Makefile Modifications*

      Two makefiles are used to build the code:

      Makefile
      src/Makefile.src

      Both of these makefiles will require manual modifications to
      several parameters prior to attempting to build the code. The
      parameters which require manual modification are clearly marked
      with a "***Modifications Required*** notice, and include such
      things as choice of compiler, compiler flags, library directories
      and libraries. Note that with the exception of the rdinput.f file,
      all source files are Fortran77. The rdinput.f routine uses Fortran
      90 derived data type syntax and will therefore require a compiler
      that can handle such.

      It is important to mention that this code is implemented with both
      MPI and OpenMP. Specifying the necessary platform dependent
      parameters to enable both of these is essential.

      Additionally, both makefiles permit the specification of C
      language parameters, even though SPhot itself does not use any C
      files. The reason for this is merely convenience in case there is
      a desire to introduce C language profilers, timing tools or
      something similar.

   2. *Modify Global Parameters*

      SPhot has a number of parameters which are "hardcoded" in its
      various include files. In most cases, these do not require
      modification. However, it is quite possible that for larger runs,
      several of these parameters may be too small and require
      modification. Each of these is discussed in the table below.

      Parameter 	Where Located 	Instructions
      *maxruns* 	includes/params.inc 	Establishes the maximum value for
      the Nruns input file parameter. Must be at least equal to (Number
      of MPI tasks * Number of OpenMP threads). Default value is 65537.
      Requires source code to be recompiled if modified.
      *MaxStreams* 	includes/pranf.inc 	Must be equal to or greater than
      the previous maxruns parameter. Must always be an odd number.
      Default value is 65537. Requires source code to be recompiled if
      modified.
      *maxMPItasks* 	includes/times.inc 	Establishes the maximum number
      of MPI tasks. Default value is 16384. Requires source code to be
      recompiled if modified.
      *maxThreadsPerMPItask* 	includes/times.inc 	Establishes the
      maximum number of OpenMP threads per MPI task. Default value is
      128. Requires source code to be recompiled if modified.

   3. *Make*

      In the main directory, simply issue the make command. This will
      invoke the primary Makefile, which in turn will invoke the
      src/Makefile.src makefile. The files in the src subdirectory will
      then be compiled and their objects copied to the bin directory.
      The sphot executable will be created and copied into the main
      directory.

   4. *Cleanup*

      Invoking the command make clean in the main directory will remove
      all object files in the bin subdirectory and the sphot executable.

------------------------------------------------------------------------


    Running the Code

Several steps are required to execute SPhot. These are listed below.

   1. *Set the Nruns Input File Parameter*

      The distribution input file input.dat specifies a number of
      parameters that control the execution of SPhot. It is usually
      necessary to change only one of these - the Nruns parameter. Nruns
      represents the "number of runs" that should be conducted. It must
      be evenly divisible by the (Number of MPI tasks * Number of OpenMP
      threads). To change Nruns, simply edit the input.dat file. Nruns
      appears as the single integer value on the first line of the file.

      Note that the default maximum value for Nruns is 65537. If this
      value is too small and must be increased, then an include file
      must be modified and the source recompiled. See the Building the
      Code  section for additional instructions.

      The three tables below show:

          * The raw format of input.dat upon distribution
          * A description of each input.dat parameter
          * Examples of how the Nruns parameter may be changed. 

      *Raw format of input.dat file* (Nruns parameter in blue)



        *16*

BENCHMARK CASE NO THOMS, NO RR/SPLIT: 49X40, BWGT=3.12e+14

         1         0

      4000         0         0         0         0         0         0

        49        40         2

  1.00e-10  7.50e-01  1.00e-12  1.05e+00

  1.80e+02  1.00e-03  5.00e-01  3.12e+12

         1       980

       981      1960

         3 4.000e-01 1.000e+03 2.000e+04

         2 5.000e-01 1.000e+02 1.000e+03
      
         0

      *Description of input.dat parameters*
      *

Field    Format   Comments

Name

      *



Nruns     i4     Number of runs. Should be evenly divided by the 

                 number of MPI tasks and OpenMP threads.

title     20a4   Title information

ilib      i10    opacity library format: 0=binary 1=formatted

illib     i10    bin library usage:    

                 0=use old bin.lib 1=form new bin.lib

npart     i10    # of particles

igroup    i10    energy bins (0=12, 1-12=1, 13=ross.mean)

ixopec    i10    opacity (0=library, 1=input,2=data)

isorst    i10    source (1=uniform in sphere, 2=plankian)

irr       i10    r-roulette/split (0=none, 1=impt, 2=size)

ithom     i10    thomson scattering (0=not used, 1=used)

icross    i10    print cross sections (0= no, 1= yes)

naxl      i10    number of axial meshes

nradl     i10    number of radial meshes

nreg      i10    number of material regions

dtol      e10.2  tolerance to cell boundaries (cm)

wcut      e10.2  low weight cutoff

tcen      e10.2  time to census (sec)

xmult     e10.2  weight mult. for russian roulette

axl       e10.2  portion of sphere analyzed (degrees)

radl      e10.2  sphere radius (cm)

opec      e10.2  input opacity (1/cm)

bwgt      e10.2  bundle weight (kev)

matb,mat  i10    material region begin/end pair.  One pair for each of 

                 nreg materials

mtl       i10    Material - one value for each of nreg materials. 

                 1=h 2=sio2 3=dt 4=c

atrat     e10.3  Atomic Ratio - one value for each of nreg materials

dns       e10.3  Density - one value for each of nreg materials

tmp       e10.3  Temperature - one value for each of nreg materials

prtflg    i5     Print flag to select level of output( see summary page for detail)

      *Examples of how the Nruns parameter might be modified*
      Number of MPI Tasks 	Number of OpenMP
      Threads per MPI task 	Possible Values for NRuns
      1 	1 	1, 2, 3...
      1 	16 	16, 32, 64...
      1 	1000 	1000, 2000, 3000...
      16 	4 	64, 128, 256...
      256 	4 	1024, 2048, 3072...
      1000 	1 	1000, 2000, 3000...

   2. *Specify the Number of MPI Tasks*

      The method for doing this will vary from platform to platform but
      will be typical for a given platform. For example, an IBM RS/6000
      SP platform may use POE environment variables, such as MP_PROCS,
      MP_NODES or MP_TASKS_PER_NODE. Alternately, the Compaq Tru64 Alpha
      platform may use the prun -n command.

      Note that the optimal number of MPI tasks to use on a platform
      will also vary, a determination which is left up to the user. In
      no case should the number of MPI tasks exceed the total number of
      processors/cpus available.

      The default maximum number of MPI tasks is 1024. If this value is
      too small, then an include file must be modified and the source
      recomplied. See the Building the Code section for additional instructions.

   3. *Specify the Number of OpenMP Threads per MPI Task*

      Specifying the number of OpenMP threads is done according to the
      OpenMP standard by using either the platform default (usually the
      number of cpus on a machine), or explicitly with the
      OMP_NUM_THREADS environment variable. As with the number of MPI
      tasks, determining the optimal number of OpenMP threads is left up
      to the user, and will vary from platform to platform.

      The default maximum number of OpenMP threads per MPI task is 128.
      If this value is too small, then an include file must be modified
      and the source recomplied. See the Building the Code
      section for additional instructions.

   4. *Invoke the Executable*

      This will vary from platform to platform also, especially if a
      "batch scheduling" system is used to run jobs. In the most simple
      case, issuing a command similar to that shown below would work:

        prompt> sphot <cr> 

        where:

      sphot is the executable

      It is assumed that the input.dat file and opac.txt file are in the
      same directory where sphot is invoked.

------------------------------------------------------------------------


    Timing Issues

All timings are obtained through the use of the portable MPI wall clock
timing routine, MPI_Wtime(). Instrumentation occurs in several locations
within the code, most importantly around the computational loop. Some
instrumentations are trivial and one is actually unnecessary (the timing
of the allocdyn.f routine).

The execution time for one "run" of SPhot is platform dependent, but in
any case does not require much time--usually only "seconds", and
certainly much less than a minute on all current architectures tested.
Most architectures tested demonstrated execution times of 10-20 seconds
per run. Compiler optimization should improve execution times in most
cases.

Every task/thread maintains its own timings, which are collected and
maintained globally by the "master" MPI task (rank=0). In this sense,
the instrumentation of SPhot actually imposes some overhead. As the
number of MPI tasks increases, so does the ratio of communication time
to execution time. Since SPhot requires so little execution time per run
to begin with, MPI overhead may gradually reduce the application's
scalability for many-task runs. The associated MPI_Barrier calls
required for synchronization (most notably in copypriv.f) will also
incur additional overhead as the number of MPI tasks increase.

The most important timing parameters are the loop speedup, program
speedup and efficiency computations that appear at the end of the
application's output (example output is available in the "Expected
Results" section). These values reflect the scalability of the
application on a given platform.

Please see the Summary writeup for information about timing data to be
reported.

------------------------------------------------------------------------

Expected Results

Please see the Summary writeup for information about expected results and
reporting of data.


Version 0.9.1

For more information about this page, contact:
Tom Spelce, spelce1@llnl.gov

<http://www.llnl.gov/>

*UCRL-MI-144211*
September 19, 2001

