Main JOBMANAGER concept: Job

This chapter explains the main JOBMANAGER concept: a job.

What is a job?

A job is a work that a user wants to perform on a computation resource (single computer or a cluster). The JOBMANAGER provides different types of job depending of what a user wants to do.

There are three types of described in the table below.

Type of job Description
Command script It’s a shell script containing the user’s commands. This kind of job is not related to SALOME. It could be used to launch any codes.
SALOME Python script It’s a Python script that will be launched into a SALOME session dedicated to this script.
YACS schema It’s a YACS schema that will be launched into a SALOME session dedicated to this schema.

Job content description

All types of job share some attributes. There could be specific attributes for some types of jobs. These exceptions will be indicated in the future in this documentation. A job has two kinds of attributes: attributes that describes the job himself, and attributes that describes the computation requirements.

The first table below describes the attributes of a job.

Attribute Mandatory Description
Name Yes This is the name of the job. It’s unique into a SALOME session.
Type Yes This is the type of the job. Currently, there are three types: command, python_salome and yacs_file.
Job file Yes This is the name, with the location, of the file containing the job’s data. Depending of the type it could a shell script, a Python script or a YACS schema, e.g. /home/user/work.sh.
Env file No An environment file could be attached to the job. It will be executed before the job.
Input files No A list of files or directories in the user computer that have to copied into the job’s work directory.
Output files No A list of files or directories that have to be copied from the job’s resource to the user computer into the result directory.
Work directory Yes It’s the directory on the job’s resource where the job will be executed.
Result directory Yes It’s the directory in the user computer where the job’s results have to be copied at the end of the job.
WC Key No The Workload Characterization Key is used on some clusters to associate each job with a project or organization.

The second table below describes the attributes of computation requirements.

Attribute Description
Maximum duration It’s the maximum expected duration of the job. When a batch manager is used, this time is interpreted as a walltime and not as a cputime. If maximum duration is not set or set to 0, the time will be set to the default value of the batch queue selected.
Number of cpu It’s the number of cpus/cores requested.
Memory It’s the amount of required memory. It is generally specified per node. With some batch mangers, it is possible to specify the required memory per core (only available with SLURM for now).
Queue It’s optional. It permits to choose a specific batch queue on the targeted cluster. If it is not defined, most of the batch systems will affect your job to the queue that fits with the other attributes requirements.
Exclusive It indicates if the job can share nodes with other jobs or not.

In addition to those attributes, the user can also specify some extra parameters with a few lines that will be added “as is” to the job submission file.

Job’s states

A job could have many states in the JOBMANAGER. The table below describes the normal states.

State Description
Created The job is correctly created and could be launched.
In_Process It’s a transient state between Created and Queued.
Queued The job is queued into the resource’s batch manager.
Paused The job is paused. Currently the JOBMANAGER GUI does not allow to paused a job.
Running The job is running on the resource.
Finished The job has run and it’s finished.

The table below describes the error states.

State Description
Not Created This state means that the job cannot be created with it’s current description. It’s often a problem with the selected resource.
Failed This state means that the execution of the job in the resource failed.
Error This state is used when a job is loaded and that it cannot be followed. It mainly happens when a job was launched into a ssh resource. If the list is saved, an error will happen when the list is loaded (ssh resource cannot be followed).