Main JOBMANAGER concept: Job

This chapter explains the main JOBMANAGER concept: a job.

What is a job?

A job is a work that a user wants to perform on a computation resource (single computer or a cluster). The JOBMANAGER provides different types of job depending of what a user wants to do.

There are three types of described in the table below.

Type of job

Description

Command script

It’s a shell script containing the user’s commands. This kind of job is not related to SALOME. It could be used to launch any codes.

SALOME Python script

It’s a Python script that will be launched into a SALOME session dedicated to this script.

YACS schema

It’s a YACS schema that will be launched into a SALOME session dedicated to this schema.

Job content description

All types of job share some attributes. There could be specific attributes for some types of jobs. These exceptions will be indicated in the future in this documentation. A job has two kinds of attributes: attributes that describes the job himself, and attributes that describes the computation requirements.

The first table below describes the attributes of a job.

Attribute

Mandatory

Description

Name

Yes

This is the name of the job. It’s unique into a SALOME session.

Type

Yes

This is the type of the job. Currently, there are three types: command, python_salome and yacs_file.

Job file

Yes

This is the name, with the location, of the file containing the job’s data. Depending of the type it could a shell script, a Python script or a YACS schema, e.g. /home/user/work.sh.

Env file

No

An environment file could be attached to the job. It will be executed before the job.

Input files

No

A list of files or directories in the user computer that have to copied into the job’s work directory.

Output files

No

A list of files or directories that have to be copied from the job’s resource to the user computer into the result directory.

Work directory

Yes

It’s the directory on the job’s resource where the job will be executed.

Result directory

Yes

It’s the directory in the user computer where the job’s results have to be copied at the end of the job.

WC Key

No

The Workload Characterization Key is used on some clusters to associate each job with a project or organization.

The second table below describes the attributes of computation requirements.

Attribute

Description

Maximum duration

It’s the maximum expected duration of the job. When a batch manager is used, this time is interpreted as a walltime and not as a cputime. If maximum duration is not set or set to 0, the time will be set to the default value of the batch queue selected.

Number of cpu

It’s the number of cpus/cores requested.

Memory

It’s the amount of required memory. It is generally specified per node. With some batch mangers, it is possible to specify the required memory per core (only available with SLURM for now).

Queue

It’s optional. It permits to choose a specific batch queue on the targeted cluster. If it is not defined, most of the batch systems will affect your job to the queue that fits with the other attributes requirements.

Exclusive

It indicates if the job can share nodes with other jobs or not.

In addition to those attributes, the user can also specify some extra parameters with a few lines that will be added “as is” to the job submission file.

Job’s states

A job could have many states in the JOBMANAGER. The table below describes the normal states.

State

Description

Created

The job is correctly created and could be launched.

In_Process

It’s a transient state between Created and Queued.

Queued

The job is queued into the resource’s batch manager.

Paused

The job is paused. Currently the JOBMANAGER GUI does not allow to paused a job.

Running

The job is running on the resource.

Finished

The job has run and it’s finished.

The table below describes the error states.

State

Description

Not Created

This state means that the job cannot be created with it’s current description. It’s often a problem with the selected resource.

Failed

This state means that the execution of the job in the resource failed.

Error

This state is used when a job is loaded and that it cannot be followed. It mainly happens when a job was launched into a ssh resource. If the list is saved, an error will happen when the list is loaded (ssh resource cannot be followed).