Ini akan menghapus halaman "User"
. Harap dipastikan.
Home -> User
L3Q is an abbreviation for: Light, Light, Lightweight Queue.
L3Q is a system to run processes/programs in parallell or in sequence on multiple nodes.
L3Q is an implementation of a light, light, lightway queue system.
L3Q contain three different programs that all have different purposes.
l3q is the client program that the user uses to interact with the system,
add jobs, view queue, view history, system status aso.
l3qd is the central daemon that takes care of the queue,
add jobs to calculating nodes, store status on jobs and nodes aso.
Receive user input from client program l3q and send jobs to node daemon node-l3qd.
node-l3qd is the node daemon that runs on each compute node.
The node daemon receive jobs from cental daemon (l3qd) and execute tasks
on local node. Node daemon send status and information about jobs
to central l3q daemon.
The L3Q client is the main tool that interacts with the L3Q daemon.
With the client you can view the queue, jobs, tasks and see the status
of the compute nodes. Using the client you can also add jobs to the
L3Q daemon that the daemon will take care of and execute when
resorces are available. To be able to use the client the user has
to be a part of the l3q group.
The default queueing system in the L3Q daemon is similar to a
FIFO (first in, first out) stack. The first job that is added will
be launched first, if there are more queued jobs they will all be
processed in turn from the top. If there are enough resources available
for the next queued job to be launch it will be started, until there
are no queued jobs left or not enough resources are available.
All jobs are processed from top to bottom and when the first queued
job will not fit the processing stops, even if there are smaller jobs
further down in the queue that would be able to be launched this will
not happen. That job has to wait until all previous queued jobs
has beed launched first.
Jobs that are in Depend state are not inluded in the queue until dependent
jobs are finished.
To view the current queue:
# Use flag -l or --long for long listing
l3q show queue
# There is a shortcut, use
l3q
l3q --long
Jobs are the unit that is queued on the system and contains one
or more tasks. When a new job is added to the system, the initial
status is Queued.
A job can have the following statuses:
Jobs with status Queued, Running, Depend and Cancel will
show up in queue.
Displayed with command:
# With flag --long or -l
# for long listing
l3q show queue
# or
l3q
Jobs with status Terminated, Canceled, Error, Node-error
will show up in history.
Displayed with command:
# With flag --long or -l
# for long listing
l3q show history
Jobs can be one of two types:
To cancel a job run the following command:
# JOBID is the id of the job
# to be found in queue list
l3q cancel job JOBID
A user is only allowed to cancel job that user has added self,
only users own jobs. If other jobs needs to be canceled
sysadmin will be able to do this.
If there are jobs depending on the job tried to be canceled
it will fail to cancel job. To be able to cancel job all depending
jobs need to be canceled first.
Task is the command that is executed on the compute node.
A job contains one or more tasks that are executed in sequence
or parallel on the compute nodes. Tasks are specified in
taskfiles that are supplied with the command:
l3q add -t TASKFILE
The taskfile is required when you add a job to the queue.
The taskfile may contain a special syntax line in file as
well as lines containing tasks, which are mandatory.
One task per line, empty lines and lines starting with #
are ignored.
Lines containing tasks must have one or two columns, first
column specifies a path to an executable program and the second
optional column contains working directory. Columns are
separated by whitespace. If second column, workdir, is not
specified, users home directory will be used as default workdir.
Special syntax in TASKFILE:
#--add-para ...
#--add-seq ...
The special syntax starts with token that describes if it is a sequential
job or a parallel job. On the same line the commandline arguments can
be added so they no not need to be specified on the commandline.
Special syntax is required in file if job is added with command:
l3q add -t TASKFILE
If job is added with command:
l3q add para ...
Arguments are parsed the following way:
If TASKFILE only contains one task on each line then both options --cores and --nodes are required.
If TASKFILE contains a line with the special syntax specifying required options then no options are
required on the command line.
If TASKFILE contains the special syntax line and --core and --nodes are given as arguments these
options will be used and the special syntax line will be ignored.
If job is added with command:
l3q add seq ...
Arguments are parsed the following way:
If TASKFILE contains a line with the special syntax specifying the name of the task this name will be used.
If TASKFILE contains the special syntax line and --jobname is given as arguments this options will be used
and the special syntax line will be ignored.
If TASKFILE contains multiple special syntax lines the last line will be used.
When a job is added it is possible to specify that new job
depend on other jobs with the --depend flag.
Specify dependencies when job is added:
# specify dependencies as a comma
# separated list of jobids
l3q add -t TASKFILE --depend 5,18,23
Dependencies must be specified as a comma separated list,
without any whitespaces, of jobids of other jobs. This list
is given as value to argument --depend.
When a job have dependencies the depending jobs has to
terminate before this job is added to the queue.
If one the deoending jobs terminates in an error this
job will also change status to error and will not be
queued or executed.
The add command has a flag that returns the jobid that
the added job gets, -i or --jobid. The idea is the
posibility to write scripts that adds dependencies as
the jobs are added.
#!/usr/bin/bash
# The last added job will
# depend on the first two jobs
a=$(l3q add --jobid .....)
b=$(l3q add --jobid .....)
l3q add --depend "$a,$b" .....
To see the status of the calculating nodes, the nodes that run all jobs,
run the following command:
l3q show status
This will display a list that shows what resources are available and what
resources are being used. The state is also shown for the calculating nodes.
The node can have the following state:
L3qd is the central daemon that interacts with all clients
and all node daemons (node-l3qd). All commands entered at
the l3q client is sent to the central daemon, processed and
result returned to client.
The l3q daemon is periodically updating the status of the
calculating nodes, setting nodes offline if unreachable and
update the queue, if enough resources are available for jobs
to start, job will be sent to node-l3qd for execution.
L3q daemon receives periodic update from node-l3qd on running
jobs on each node, each node responsible for sending updates
to central daemon.
L3q node daemon is the daemon running on each calculating node.
It receives tasks from central daemon (l3qd) that is executed
on the host in a systemd slice. Periodically status on tasks
are sent to central daemon (l3qd).
Ini akan menghapus halaman "User"
. Harap dipastikan.