Compiling Jobs.

Big Red

Two types of Compilers:  IBM and GCC


IBM Visual Age Compilers:

1.  Generally compile faster executables than gcc for the same level of optimization.

2.  A little stricter to the Standards than gcc.

3.  One basic compiler,  different front ends for each language (C, C++, Fortran variants).  So, many optimization switches are the same across all supported languages.

4.  A note about the environment variable, “OBJECT_MODE.”  If the switches “-q32” or “-q64” are used in the


C:

 
The compiler is called "xlc."   There is a cc compiler.  On AIX systems this is embedded in the OS and is K & R.  Unexpected things can happen if used.  On Big Red, cc has been soft linked to gcc (see above).  The preferred C compiler is xlc.    I categorize three different switch functions for the compiler: machine switches,  user switches and optimization switches.

Machine switches: 
  
These switches are always the same for the platform.  For Big Red:  "-qarch=ppc970 -qtune=ppc970 -qenablevmx -qaltivec"  are a good choice.  If the altivec processors are unusable, there is no penalty for setting the switch.

User switches:
 
These switches vary with the job.  For instance,  if addressing is expected above 2 GB,  then a 64 bit compile will be needed with the "-q64" switch.  If the C preprocessor is all that is desired, then the "-E" switch should be used.  Please note that the general rule of thumb with compiler switches is that switches prefixed with "-q" are sent to the compiler, while those prefixed with "-b"  are sent to the loader.  Some switches (like "-E" and "-o") are exceptions due to historical precedence. 

Optimization switches:
  These are the switches most people think about when they think about compiler switches.  Optimizations can be both positive (faster executables) or negative (slower executables).  Some switches, like "-Q" are almost always positive ("-Q"  attempts to place functions in-line with the main body and remove the overhead of the function call).  Others, like "-g"  are almost always negative ("-g" creates debug symbol tables).  Various levels of "-O" are offered.  Without a number following, "-O" defaults to "-O2."  Generally,  the higher the number, the more risks the compiler is willing to take.  At "-O0" absolutely no optimization occurs.  At "-O5"  everything from loop fusion to cross-source code (for multiple source code files) optimizations are considered.  Often, more is not better.  Testing and experimenting is about the only way to properly tune an executable. 

On Big Red, even if you have MANPATH problems,  the command:

man -M /opt/ibmcmp/vacpp/8.0/man/en_USxlc

will find the xlc man page for other options.

C++:

The IBM compiler for C++ codes is called xlC.  This compiler used to have real troble with many C++ source codes.  It has improved (as has C++ standardization) quite a bit in the last several years.  Most all of the same switches that are available to xlc are also available with xlC.  In addition, many switches have been added to help with gcc compatibility.   One example is "-qlanglvl=gnu_complex." This switch  " instructs the compiler to recognize GNU complex data types and related keywords."

The man path the same as xlc.

Fortran:

Recently dropped FORTRAN IV ('66) support.  No excuse for FORTRAN 77 any more either.  Use "xlf90 -qfixed" in its place.  A few dropped syntax include "computed goto's" 

Support for Fortran 2003 is nearly complete.

IBM "invented" FORTRAN.  So, they see no reason files should not always end ".f." If you have a file with a ".f90" suffix (or any other
for that matter)  use the "-qsuffix" switch.  To convert .f90  it would be:  -qsuffix=f=f90."

Machine switches: 
  
For Big Red:  "-qarch=ppc970 -qtune=ppc970"  are the sae as xlc/C.  However,  the altivec switches are not directly available.  Altivec enabled C routines (such as FFTW) .  are linked to the Fortran programs.

User switches:
 
For Fortran, a 64 bit compile is highly recommended in most cases.  Fortran assumes memory management responsibilities, so why not use them?   Once again 64-bit compilations are designated with the "-q64" switch.  However,
additional "safety" switches, such as "-C " for testing array parameters are also available.

Optimization switches:
  The xlf90 optimization switches are nearly identical to the xlc optimizations.    Once again, switches, like "-Q" are almost always positive anthd switches like "-g"  are almost always negative.  The same levels of "-O" are offered but with slightly different effects.  One throttle on risky optimizations is the "-qstrict" option.  This is only available at atleast "-O3."  Generally, piece of mind comes at a significant performance cost.  The warning messages are very prolific.   Again,  testing and experimenting is about the only way to properly tune an executable.   

On Big Red, even if you have MANPATH problems,  the command:

man -M /opt/ibmcmp/xlf/10.1/man/en_US/  xlf90

will find the xlf90 man page for other options.

Threads and System Math Libraries:

  On all compilers,  adding a "_r" suffix enables threads.  An example: xlC_r.  Often threads need enablling that run "behind the scenes."

An example is compiling and linking in IBM's Engineering and Scientific Subroutine Libraries (ESSL).  If using actual threads (like OpenMP or POSIX), then the compile switch is "-qesslsmp."
Even serial code requires the "_r" suffix as in "xlf90_r -q64 -qessl..." or "xlc_r -qessl..."  For xlC_r,  must add "-lessl -qnocinc=/usr/include/essl"  as well to redefine the include files.


GCC:

1.  Default is version 3.3.3.  Version 4.2.2  is available in SoftEnv.

2.  Works the same as gcc everywhere else.

3.  64-bit compilations are possible with the "-mpowerpc64" compile switch.

4. Some other handy compiler options for Big Red:

        A.  -mcpu=970  Identifies our processor type.

        B.   -mabi=altivec  Needed with to enable the next switch ( -maltivec).

        C.  -maltivec  For C codes using 4-byte real numbers, will attempt to vector stream the computations.

         D.  -mfused-madd  Turns on the IBM specific "multiply-add" instruction during optimizations (where possible).

GCC is on Quarry too:

1.  Default is version 3.4.6.  Version 4.7.0 is available in SoftEnv.

 Machine switch:

The switch “-march=nocona” turns on 64-bt extensions as well as SSE vector instructions.

Optimization Switches:

1.  The default is no optimizations at all.

2.  Many options are known to slow the executable.  Often mentioned in the man page.

Example: 

    -funroll-all-loops

           Unroll all loops, even if their number of iterations is uncertain when the

           loop is entered.  This usually makes programs run more slowly.  -fun-

           roll-all-loops implies the same options as -funroll-loops.

3.  Macro, “-O2” is structured to not increase executable’s size.

4.  Macro, “-O3” is usually a good idea.  It turns on many options:

            -fforce-mem -foptimize-sibling-calls -fstrength-reduce -fcse-follow-jumps 

-fcse-skip-blocks -frerun-cse-after-loop  -frerun-loop-opt -fgcse  -fgcse-lm  -fgcse-sm  -fgcse-las

           -fdelete-null-pointer-checks -fexpensive-optimizations -fregmove -fsched-

           ule-insns  -fschedule-insns2 -fsched-interblock  -fsched-spec -fcaller-saves

           -fpeephole2 -freorder-blocks  -freorder-functions -fstrict-aliasing  -funit-at-a-time

-falign-functions  -falign-jumps -falign-loops  -falign-labels –fcrossjumping  -finline-functions

-fweb, -frename-registers and -funswitch-loops

5.         “-xT”  is usually a good architecture choice.

 

Native Intel compiler:

1.  Started with GCC as a base set beginning with Intel 8.

2.  Emphasis is with ease of use.

3. Generally about 1/3 faster code is generated with the same amount of effort.

C compiler is icc, C++ is icpc, Fortran is ifort

1.  Addressing size is determined by either “-m64” or “-m32” switches.

2.  A good optimization switch to try is “-fast.”  This implies -O3, -ipo, -static,  -no-prec-div,

            and  -xhost.  Of course, “-O3” is a macro itself.  It implies loop transformation as well as many

            others.  Sometimes this can be trouble.  If so, try “-O2” and the rest of the above switches.

 

A note on C++ (icpc)/GCC interoperability from the man page:

C++ compilers are interoperable if they can link object files and libraries generated by one compiler

with object files and libraries generated by the second  compiler,  and the resulting executable runs

successfully. Some GNU gcc* versions are not interoperable, some versions are interoperable.

By default, the Intel compiler will  generate code that is interoperable with the version of gcc it finds

on your system.

Portland Group compiler:

The third party compilers from the Portland Group are installed on Quarry.  This compiler suite is useful when
software packages (often built on other platforms) support it but not Intel.  Some users like the idea of a compiler
that is supported across platforms (like GCC) but still provides professional support and bug fixing.

Compilation precision considerations

Addressing size often has no direct impact on precision. Some compilers do default to higher precisions because pointers

are the address size and some C codes declare integers as pointers, but this is arbitrary.  Most common defaults are 4-byte,

32-bits.  How big a deal is that?  Lets investigate.  For many codes, like popular molecular dynamics modelers or finite difference

codes, 12 bits of mathematical precision are necessary.  Often the wind speed in meteorological codes was originally only

measured with 4 digits of precision.

 A computer is an imprecise machine.  Just adding and subtracting will cause errors.  Below is a simple code.

It creates and writes out 1,000,000 ASCII numbers to a file.  It then adds them forward {1, 2, 3…1,000,000} and

Adds them backwards {1,000,000,  999,999  999,998…1}  and prints the result.

      

      program time
      real*4 temp, paste
      real*4 A(1000000)
      real*4 sumf, sumb
      integer j
      integer counter

      counter = 1000000
      temp = 0.0
      sumf = 0.0
      sumb = 0.0
      open(unit = 10, file = 'data.dat' ,status = 'unknown')
      paste = 1.0 / 77.
      call random_seed()
      do 30 j = 1, counter
      call random_number(temp)
      temp = temp * paste
      A(j) = temp
  30  continue
      do 50 j = 1, counter, 7
      A(j) = 785454654 * A(j)
  50  continue
       do 70 j = 1,counter , 3
       A(j) = A(j) * 0.0000000001
  70  continue
      do 90 j = 1, counter
      write(10, 22) A(j)
      sumf = sumf + A(j)
  22  format( f48.25)
  90  continue
      close(10)
      do 80 j = counter, 1, -1
      sumb = sumb + A(j)
  80  continue
      print *, "forward sum is = ", sumf
      print *, "reverse sum is = ", sumb
      end program

So, lets look at the result.

32-bit precision:

/ray/round_error> ./round_32.out

 forward sum is =  0.4852211057E+12

 reverse sum is  =  0.4852207124E+12

 

First discrepancy is in the 6th digit.

 

64-bit precision:

/ray/round_error> ./round_64.out

 forward sum is =  485220744826.172424

 reverse sum is   =  485220744826.173584

 

First discrepancy is in the 15th digit

 

128-bit precision (IBM extension to the languge):

/ray/round_error> ./round_128.out

 forward sum is =  485220744826.18382927712407234600134

 reverse sum is   =  485220744826.18382927712407234630628

 

First discrepancy is in the 31st digit.

Simple OpenMP  (so simple, lets just do it...)

In our summing program, we had a lot of independent loops.  So, why not lets parallelize them with OpenMP?
We only need to add three new lines and get 4 processors to our serial code!

      program time
      real*4 temp, paste
      real*4 A(1000000)
      real*4 sumf, sumb
      integer j
      integer counter

      call OMP_SET_NUM_THREADS(4)

      counter = 1000000
      temp = 0.0
      sumf = 0.0
      sumb = 0.0
      open(unit = 10, file = 'data.dat' ,status = 'unknown')
      paste = 1.0 / 77.
      call random_seed()

!$OMP PARALLEL PRIVATE(j)

      do 30 j = 1, counter
      call random_number(temp)
      temp = temp * paste
      A(j) = temp
  30  continue

      do 50 j = 1, counter, 7
      A(j) = 785454654 * A(j)
  50  continue

       do 70 j = 1,counter , 3
       A(j) = A(j) * 0.0000000001
  70  continue

!$OMP END PARALLEL

      do 90 j = 1, counter
      write(10, 22) A(j)
      sumf = sumf + A(j)
  22  format( f48.25)
  90  continue
      close(10)
      do 80 j = counter, 1, -1
      sumb = sumb + A(j)
  80  continue
      print *, "forward sum is = ", sumf
      print *, "reverse sum is = ", sumb
      end program
---------------------------------------------------------------------

Need a little different compile line:

 xlf90_r -q64 -qsmp=omp -o input_smp_test input_smp.f

(old line:  xlf90 -q64 -o input_serial_test input_test.f)
____________________________________________________

O.K.  So, how did we do? 

Serial timing:
time ./input__serial_test
 forward sum is =  0.4852211057E+12
 reverse sum is =  0.4852207124E+12

real    0m3.136s
user    0m1.713s
sys     0m0.115s

OpenMP timing:
 time ./input__smp_test
 forward sum is =  0.3957986771E+29
 reverse sum is =  0.3957989605E+29

real    0m9.086s
user    0m5.963s
sys     0m1.959s
--------------------------------------

Oops!

Let's add "temp" as a private variable in the line:

!$OMP  DO PRIVATE(j, temp)

We get:

>time ./new_smp
 forward sum is =  0.2485100971E+30
 reverse sum is =  0.2485099460E+30

real    0m8.551s
user    0m5.796s
sys     0m2.007s
----------------------------

Better but, still pretty poor.  Lets try identfying each loop to OpenMP.  Here is our parallel section now:

!$OMP PARALLEL
!$OMP  DO PRIVATE(j, temp)

      do 30 j = 1, counter
      call random_number(temp)
      temp = temp * paste
      A(j) = temp

  30  continue

!$OMP END DO
!$OMP DO  PRIVATE(j)

      do 50 j = 1, counter, 7
      A(j) = 785454654 * A(j)
  50  continue

!$OMP END DO
!$OMP DO PRIVATE(j)

       do 70 j = 1,counter , 3
       A(j) = A(j) * 0.0000000001
  70  continue

!$OMP END DO
!$OMP END PARALLEL


Looks good, so how about now?

> time ./new_smp
 forward sum is =  0.4869933957E+12
 reverse sum is =  0.4869911347E+12

real    0m6.230s
user    0m3.922s
sys     0m1.626s
------------------------------

Now we are only twice as slow as serial.  Any ideas why?


Tutorial overview:  https://computing.llnl.gov/tutorials/openMP/ /

Message Passing Libraries

Big Red is a parallel machine!  It is not structured for serial codes!

Serial codes waste 75% of the processor cores and the interconnect switch (which is about 1/3 the cost of the machine by itself)
Serial codes should normally run on Quarry.  However, a serial queue exists to backfill (fill in empty nodes) on both Big Red and Quarry.
 
1. Messsage Passing Packages are selected in the Softenv environment. 

2. No MPI libraries exist in the default environment. 

3. Must link the library chosen consistant with address precision (32 or 64) chosen for the compile.

4. Once an MPI library is added,  compiles are made through a wrapper to the compiler that built the library.
For instance,  on Big Red, mpif90 is actually just a wrapper to xlf90_r.  The same switches used by xlf90 are available to mpif90_r.

A super link to go deeper into this subject:  http://kb.iu.edu/data/autn.html

 

MPICH: 

Argonne Original.  MPICH 1 is available.  MPICH 2 could be.  Uses mpirun. 

1.      It has a limited runtime environment.   Best Practice:  Pass paths as environment variables.

OpenMPI:

Replaced LAM.  No more lamboot.  Has look and feel of MPICH to the user.  Improving with each new release.

Tutorial overview:  https://computing.llnl.gov/tutorials/mpi/

Submitting A Job

LoadLeveler on Big Red

Writing the script:

The job script is divided into a keyword stanza and an execution section. 
If any lines exist in the script other than keywords (including just #!/bin/bash),  the script is executed.  Otherwise, it is sourced.

Best practice: (I don't always do it though) is to write your job in its own script file and tell LL (LoadLeveler)  to execute that.

All key words are prefaced with "#@ "

There are two basic types of keywords,  those you might expect and those you might not.

Expected keywords include:
output, error, executable, notification etc.

Unexpected keywords include:
node_usage, node, job_type, checkpoint, queue


Typical script explained:

#  Pretty typical file designator.  Will take absolute paths or try to write to your initial directory.
#@ output = test_namd.out 
#@ error = test_namd.err

#  When do you want the job to send an email?  Must put address in your .forward file.
#@ notification = complete

# How long to run?  The default is in seconds.
#@ wall_clock_limit = 1200

#  Lots of choices.  "COPY_ALL" sources your current environment for the job's environment.
#@ environment = COPY_ALL

# What queue to run the job in?
#@ class = LONG

#  Where to start running the job.
#@ initialdir =  /N/gpfsbr/namd_example
#
## Teragrid Account number goes in the line below.  "NONE" is the account number for IU users.
#@ account_no = NONE

#The name of the executable to be run.
#@ executable = My_execution_script.ksh

# Whatever else the executable wants put on its command line.
#@ arguments = fee fi foe fum

#  Job control keywords.
#@ node = 4
#@ tasks_per_node = 2
#@ job_type = parallel
#@ node_usage = not_shared

#  What to do if the job halts.  Checkpoint is not totally supported yet.
#@ checkpoint = no
#@ restart = no

#  The last keyword in the list.  Sort of a "stop" key.  Anything below this keyword will be executed.
#@ queue


Sample Job:

#@ output = test_namd.out
#@ error = test_namd.err
#@ notification = complete
#@ wall_clock_limit = 1200
#@ environment = COPY_ALL
#
##  DEBUG is small debug queue, LONG is two weeks, NORMAL is large jobs for 48 hour max.
#@ class = NORMAL
#
## Please change this line to your work directory.
#@ initialdir =  /N/gpfsbr/namd_example
#
#@ executable = mpich_namd.bash
#
# Teragrid Account number goes in the line below.  I used none for test.
#@ account_no = NONE
#
#@ node_usage = not_shared
#@ node = 4
#@ tasks_per_node = 2
#@ job_type = parallel
#@ checkpoint = no
#@ queue

mpich_namd.bash:

#------------------------------------------------------------------
## Once again, this should point to your work directory if you use it.
cd /N/gpfsbr/namd_example
#
export NAMD2=`which namd2`
#
## Get machine list (list of the nodes where your job will run in Big Red and
# then write the list to /tmp/machinelist.$LOADL_STEP_ID so it can be passed into mpirun.
llmachinelist
#
## Make sure number of tasks is <= to (node * tasks_per_node)
mpirun -np 8  -machinefile /tmp/machinelist.$LOADL_STEP_ID $NAMD2 apoa1.namd

## Clean up temporary machine list file
rm /tmp/machinelist.$LOADL_STEP_ID


Submitting The Job:

Your softenv environment must be set BEFORE you submit your job.

The command:  "llsubmit {name of LL script}"  will submit the job to run.  


Check The Job:

There are lots of ways to check on your job.  The simplest is:

llq | grep {your_username}

If your job isn't running, use the job number with "showstart" to find out when it is scheduled.
An example:   "showstart s10c2b5.10488.0"

A more in-depth command would be "checkjob"  Here is an example:

checkjob s10c2b5.10692.0
job s10c2b5.10692.0

AName: 0
State: Running
Credsuser:rsheppar  group:hpc  account:NONE  class:MED
WallTime:   00:00:00 of 00:20:00
SubmitTime: Wed May 30 19:00:03
  (Time Queued  Total: 00:00:41  Eligible: 00:00:41)

StartTime: Wed May 30 19:00:44
Total Requested Tasks: 8

Req[0]  TaskCount: 8  Partition: base
Memory >= 0  Disk >= 0  Swap >= 0
Opsys:   Linux2  Arch: PPC64  Features: ---

Allocated Nodes:
[s6c2b10.dim:2][s6c2b11.dim:2][s6c2b12.dim:2][s6c2b13.dim:2]

IWD:            /N/gpfsbr/namd_example
Executable:     /N/gpfsbr/namd_example/mpich_namd.bash
StartCount:    
Flags:          BACKFILL,RESTARTABLE
Attr:           BACKFILL
StartPriority:  3804
Reservation 's10c2b5.10692.0' (-00:00:09 -> 00:19:51  Duration: 00:20:00)

Workflows in LoadLeveler

1.  Each job is a static set of requirements.  Suppose you need a workflow.  In other words, you may need to move a lot of data files first, then process them in parallel with MPI and finally do some sort of serial post-processing of the results.  If all three jobs were submitted separately, the second step might try to run before the first step or other out of order action might happen.  LoadLeveler understands the job dependencies if they are submitted as multiple steps in a single script.  Each step is an independent job,  but may be made aware of interdependence between each of the steps.  Here is an example:

#/***************************/
#/* Job Step #1- Small Parallel:            */
#/***************************/
#!/bin/ksh
#@ output = myb1.out
#@ error = myib1.err
#@ notification = always
#@ environment = COPY_ALL
#@ wall_clock_limit = 25:00
#@ class = LONG
#@ node = 1
#@ tasks_per_node = 4
#@ initialdir = /N/gpfsbr/YOUR_LOGIN_NAME/multistep
#@ job_type = parallel
#@ job_name = multistep
#@ shell = /bin/ksh
#@ step_name = job_step_1
#@ account_no = NA0101
#@ node_usage = not_shared
#@ checkpoint = no
#@ queue
#------------------------------------------------------------------
 tar –xvmf my_work.tar


#/***************************/
#/* Job Step #2 – Large Parallel:         */
#/***************************/
#@ output = myb2.out
#@ error = myb2.err
#@ notification = always
#@ environment = COPY_ALL
#@ wall_clock_limit = 12:00
#@ class = LONG
#@ node = 50
#@ tasks_per_node = 4
#@ initialdir = /N/gpfsbr/YOUR_LOGIN_NAME/multistep
#@ dependency = (job_step_1 == 0)
#@ job_name = multistep
#@ account_no = NA0101
#@ node_usage = not_shared
#@ network.mpi = css0,shared,us
#@ job_type = parallel
#@ step_name = job_step_2
#@ shell = /bin/ksh
#@ checkpoint = no
#@ queue
#------------------------------------------------------------------
./run.script.ksh


#/*************************/
#/* Job Step #3 - Serial: */
#/*************************/
#@ account_no=NA0101
#@ class = LONG
#@ output = myb3.out
#@ error = myb3.err
#@ dependency = (job_step_2 == 0)
#@ environment = COPY_ALL
#@ initialdir = /N/gpfsbr/YOUR_LOGIN_NAME/multistep
#@ executable = stage3.ksh
#@ input = /dev/null
#@ job_name = multistep
#@ job_type = serial
#@ node_usage = not_shared
#@ notification = always
#@ notify_user = YOUR_LOGIN_NAME
#@ shell = /bin/ksh
#@ step_name = job_step_3
#@ wall_clock_limit = 00:30:00
#@ queue

 

NOTES:  Step 1 executed a command, Step 2 executed a script of commands while step 3 used keywords for the command.

The order of the keywords is arbitrary.

Quarry uses PBS

Both LoadLeveler and PBS need to do the same job (and share the same scheduler package) so many keywords map to each other:

Common commands

 

TORQUE command

LL command

Job submission

qsub [scriptfile]

llsubmit [scriptfile]

Job deletion

qdel [job_id]

llcancel [job_id]

Job status (for user)

qstat -u [username]

llq -u [username]

Extended job status

qstat -f [job_id]

llq -l [job_id]

Hold a job temporarily

qhold [job_id]

llhold [job_id]

Resume job on hold

qrls [job_id]

llhold -r [job_id]

List of usable queues

qstat -Q

llclass

GUI for batch system

xpbs

xload

Environment variables

 

TORQUE command

LL command

Job ID

$PBS_JOBID

$LOADL_STEP_ID

Submission directory

$PBS_O_WORKDIR

$LOADL_STEP_INITDIR

Processor list

$PBS_NODEFILE

$LOADL_PROCESSOR_LIST (See note below.)

Note: For jobs using more than 128 tasks (processors/cores), the value of $LOADL_PROCESSOR_LIST will be set to NULL. If your job uses more than 128 tasks, run: /opt/ibmll/LoadL/full/bin/ll_get_machine_list > /tmp/machinelist.$LOADL_STEP_ID

Then use the temporary file (/tmp/machinelist.$LOADL_STEP_ID) created to get the list of machines assigned to your job.

Resource specifications

 

TORQUE command

LL command

Queue

#PBS -q [queue]

#@ class=[queue]

Nodes

#PBS -l nodes=[#]

#@ node=[#]

Processors

#PBS -l ppn=[#]

#@ tasks_per_node=[#]

Wall clock limit

#PBS -l walltime=[hh:mm:ss]

#@ wall_clock_limit=[hh:mm:ss]

Standard output file

#PBS -o [file]

#@ output=[file]

Standard error

#PBS -e [file]

#@ error=[file]

Copy environment

#PBS -V

#@ environment=COPY_ALL

Notification event

#PBS -m abe

#@ notification=start|error|complete|never|always

Email address

#PBS -M [email]

#@ notify_user=[email]

Job name

#PBS -N [name]

#@ job_name=[name]

Job restart

#PBS -r [y|n]

#@ restart=[yes|no]

Job type

N/A

#@ job_type=[type]

Initial directory

N/A

#@ initialdir=[directory]

Node usage

N/A?

#@ node_usage=not_shared

Memory requirement

N/A

#@ requirements=(Memory >= NumMegaBytes)

Common Moab scheduler commands

Big Red and Quarry use the Moab job scheduler. Frequently used Moab scheduler commands include:

Command

LL and TORQUE

Show currently running/queued jobs

showq | less

Check a job's status

checkjob [jobid] or checkjob -v [jobid] (for details)

Show when your job might start

showstart [job_id]

Show fairshare information (priority, etc.)

diagnose -f | less

Check a nodes status

checknode [nodename]

Show current reservations

showres

 

Sample PBS Job on Quarry

#!/bin/bash

#PBS -l nodes=4:ppn=2,cput=4:00:00,walltime=3:00:00

#PBS -m ae

#PBS -N raytest

#PBS -o /N/u/rsheppar/Quarry/test_pbs2.out

#PBS -e /N/u/rsheppar/Quarry/test_pbs2.err

mpirun -np 8 -machinefile $PBS_NODEFILE hellompi

echo `cat $PBS_NODEFILE`

echo "this is just a test"

 

NOTES:  Script is executed.  Resource arguments are all on one line.

 

____________________________________________________________________________________________

Notes by: Ray Sheppard,  HPCST,  RT,  UITS,  Indiana University