123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104 |
- <html>
- <head>
- <meta http-equiv="content-type" content="text/html; charset=UTF-8">
- <meta charset="utf-8">
- <meta name="viewport" content="width=device-width, initial-scale=1">
- <link rel="stylesheet" href="static/css/rsq.css">
- <title>RSQueue</title>
- </head>
- <body>
- <div class="wrapper">
- <div class="content">
- <section class="top-nav">
- <div class="logo">
- <a class="logo" href="index.html"><h1>RSQ</h1></a>
- </div>
- <input id="menu-toggle" type="checkbox">
- <label class="menu-button-container" for="menu-toggle">
- <div class="menu-button"></div>
- </label>
- <ul class="menu">
- <li><a class="menu-link" href="index.html">Home</a></li>
- <li class="menu-here">Description</li>
- <li><a class="menu-link" href="documentation.html">Documentation</a></li>
- <li><a class="menu-link" href="rover.html">Who is Rover?</a></li>
- <li><a class="menu-link" href="source-code.html">Source Code</a></li>
- </ul>
- </section>
- <div class="body-content">
- <h1>Description</h1>
- <b>Rsqueue contains three different complementing programs:</b>
- <ol>
- <li>
- rsq - The client program that user interacts with rsqueue, add jobs, look at status and history and more.
- </li>
- <li>
- rsqd - The rsqueue daemon that manages the queue and the compute nodes and distributes the jobs on the compute nodes.
- </li>
- <li>
- node-rsqd - The node daemon that manages processes/programs on local node. Communicates with , reports status and results to the rsqd. Receives jobs for execution from rsqd.
- </li>
- </ol>
- </p>
- <p>
- The rsqueue system can have multiple client programs (rsq) on multiple computers to interact with users, but there can only be one rsqueue daemon (rsqd) that manages the queue and assigns jobs to the compute nodes and serves the clients and the node daemons (node-rsqd). The rsqueue system should have one node daemon (node-rsqd) on each compute node that is registered with the rsqueue daemon and assigned to the system to run processes/programs.<br/>
- Both clients and node daemons needs to be registered with the rsqueue daemon to be able to communicate with the rsqueue daemon.<br/>
- The rsqueue system is expected to be implemented on a local and private network and not designed for to be spread on multiple networks or devided and spread on internet.<br/>
- The rsqueue system will contain one rsqueue daemon and one node daemon on each compute node. The reason for having a daemon on each compute node is that it will be possible to turn the rsqueue daemon off without desturbing running processes on the compute nodes. The status of running tasks will not be updated nor the queue will be updated while the deamon is turned off but both queue and status will be updated when the rsqueue daemon is started. The node daemon will store the current state of processes while the rsqueue daemon is unreachable.
- </p>
- <br/>
- <p>
- <b>Definition of concepts:</b>
- <p>
- <i>Job:</i><br/>
- A job is what is queued and it contains a description of one or more processes/programs to be run on the compute nodes either in sequece or parallell. A job can be in one of multiple states: queue, running, terminated, cancelled or error.
- </p>
- <p>
- <i>Task:</i><br/>
- A task is one process/program and a job is built up of one or more tasks.
- </p>
- <p>
- <i>Node:</i><br/>
- A node is short for computational node and is a computer that is assigned by the rsqueue system to run tasks.
- </p>
- <p>
- <i>Queue system:</i><br/>
- User creates a job file (with specific syntax) containing one or more tasks, parallell or sequential job is added to the queue with the rsqueue client. The rsqueue daemon add the job as the last job in the queue. The rsqueue daemon processes the queue from the top (first come, first serve) and continues to add jobs to the compute nodes as long as specified resources for the job is available. At the first job that request more resourses then what is available on the compute nodes the rsqueue daemon will stop to process the queue any further. Even if there is a job further down in the queue that will fit available resources this job will not be started, the queue is processes from top to bottom and no jobs can jump a head of another job.
- </p>
- <p>
- <i>Register:</i><br/>
- Both clients and node daemons needs to register with the rsqueue deamon to be allowed to communicate with rsqueue daemon. A secret is shared and used for communication with the rsqueue daemon.
- </p>
- <p>
- <i>Dependency:</i><br/>
- Job can be dependent of each other and it is possible to specify that one job has to be finished before the next job is allowed to start.
- </p>
- <p>
- <i>Management:</i><br/>
- The administrator (root) is able to add and remove compute nodes from the system. It is also possible to put a compute node in maintenance mode which means that the node will finalize running processes but will not accept new jobs. When all running jobs has terminated node will enter maintenance mode and will not accept new jobs and it will be possible to turn the node off without desturbing the rsqueue system.
- </p>
- </p>
- </div>
- </div>
- <footer>
- <div>
- Copyright © 2023 Marcus Pedersén
- <br>
- <br>
- <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">
- <img alt="Creative Commons License" style="border-width:0" src="static/img/cc-by-sa.png">
- </a>
- <br>
- This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.
- </div>
- </footer>
- </div>
-
- </body>
- </html>
|