From the user's perspective, a workload mangement system enables you to make more efficient use of your time by allowing you to specify the tasks you need run on the cluster. The system takes care of running these tasks and returning the results to you. If the cluster is full, then it holds your tasks and runs them when the resources are available.
PBS provides two user interfaces: a command-line interface (CLI) and a graphical user interface (GUI). You can use either to interact with PBS: both interfaces have the same functionality. (The examples below show the command line interface; see the "Using the PBS Graphical User Interface" section below for examples of the GUI.)
Using either interface, you create a batch job that you then submit to PBS. A batch job is a shell script containing the set of commands you want run on the cluster. It also contains directives that specify the resource requirements (such as memory or CPU time) that your job needs. Once you create your PBS job, you can reuse it, if you wish, or you can modify it for subsequent runs. Example job scripts are shown below.
PBS also provides a special kind of batch job called interactive batch. This job is treated just like a regular batch job (it is queued up and must wait for resources to become available before it can run). But once it is started, the user's terminal input and output are connected to the job in what appears to be an rlogin session. It appears that the user is logged into one of the nodes of the cluster, and the resources requested by the job are reserved for that job. Many users find this feature useful for debugging their applications or for computational steering.
Previously we mentioned that a PBS job is simply a shell script containing resource requirements of the job and the command(s) to be executed. (However, if you use the PBS graphical interface, you do not have to edit any batch files; instead, the GUI provides a point and click interface that creates the batch job script for you.) A sample PBS job might look like the following:
#!/bin/sh #PBS -1 walltime=1:00:00 #PBS -1 nodes=4 #PBS -j oe cd ${HOME}/PBS/trial mpiexec -n 4 myprogram
This script would then be submitted to PBS using the qsub command.
Let us look at the script for a moment. The first line tells what shell to use to interpret the script. Lines 2-3 are resource directives, specifying arguments to the "resource list" ("-1") option of qsub. Note that all PBS directives begin with #PBS. These lines tell PBS what to do with your job. Any qsub option can also be placed inside the script by using a #PBS directive. However, PBS stops parsing directives with the first blank line encountered.
Returning to our example above, we see a request for one hour of wall-clock time and four nodes. The fourth line is a request for PBS to merge the stdout and stderr file streams of the job into a single file. The last two lines are the commands the user wants executed: change directory to a particular location, then execute an MPI program called 'myprogram'.
This job script could have been created in one of two ways: using a text editor, or using the xpbs graphical interface (see below).
The command used to submit a job to PBS is qsub. For example, say you created a file containing your PBS job called 'myscriptfile'. The following example shows how to submit the job to PBS:
% qsub myscriptfile 12322.sol.pbspro.com
The second line in the example is the job identifier returned by the PBS Server. This unique identifier can be used to act on this job in the future (before it completes running). The next section of this chapter discusses using this "job id" in various ways.
The qsub command has a number of options that can be specified either on the command-line or in the job script itself. Note that any command-line option will override the same option within the script file.
Table 17.1 lists the most commonly used options to qsub. See the PBS User Guide for the complete list and full description of the options.
Option |
Purpose |
---|---|
|
|
-1 list |
List of resources needed by job |
-q queue |
Queue to submit job to |
-N name |
Name of job |
-S shell |
Shell to execute job script |
-p priority |
Priority of job relative to your jobs |
-a datetime |
Delay job under after datetime |
-j oe |
Join output and error files |
-h |
Place a hold on job |
The "-l resource_list" option is used to specify the resources needed by the job. Table 17.2 lists all the resources available to jobs running on clusters.
Resource |
Meaning |
---|---|
|
|
arch |
System architecture needed by job |
cput |
CPU time required by all processes in job |
file |
Maximum single file disk space requirements |
mem |
Total amount of RAM memory required |
ncpus |
Number of CPUs (processors) required |
nice |
Requested "nice" (Unix priority) value |
nodes |
Number and/or type of nodes needed |
pcput |
Maximum per-process CPU time required |
pmem |
Maximum per-process memory required |
wall time |
Total wall-clock time needed |
workingset |
Total disk space requirements |
Once the job has been submitted to PBS, you can use either the qstat or xpbs commands to check the job status. If you know the job identifier for your job, you can request the status explicitly. Note that unless you have multiple clusters, you need only specify the sequence number portion of the job identifier:
% qstat 12322 Job id Name User Time Use S Queue ------------- ------------ ------ -------- - ----- 12322.sol myscriptfile jjones 00:06:39 R submit
If you run the qstat command without specifing a job identifier, then you will receive status on all jobs currently queued and running.
Often users wonder why their job is not running. You can query this information from PBS using the "-s" (status) option of qstat, for example,
% qstat -s 12323 Job id Name User Time Use S Queue ------------- ------------ ------ -------- - ----- 12323.sol myscriptfile jjones 00:00:00 Q submit Requested number of CPUs not currently available.
A number of options to qstat change what information is displayed. The PBS User Guide gives the complete list.
So far we have seen several of the PBS user commands. Table 17.3 is provided as a quick reference for all the PBS user commands. Details on each can be found in the PBS manual pages and the PBS User Guide.
Command |
Purpose |
---|---|
|
|
qalter |
Alter job(s) |
qdel |
Delete job(s) |
qhold |
Hold job(s) |
qmsg |
Send a message to job(s) |
qmove |
Move job(s) to another queue |
qrls |
Release held job(s) |
qrerun |
Rerun job(s) |
qselect |
Select a specific subset of jobs |
qsig |
Send a signal to job(s) |
qstat |
Show status of job(s) |
qsub |
Submit job(s) |
xpbs |
Graphical Interface (GUI) to PBS commands |
PBS provides two GUI interfaces: a TCL/TK-based GUI called xpbs and an optional Web-based GUI.
The GUI xpbs provides a user-friendly point-and-click interface to the PBS commands. To run xpbs as a regular, nonprivileged user, type
setenv DISPLAY your_workstation_name:0 xpbs
To run xpbs with the additional purpose of terminating PBS Servers, stopping and starting queues, or running or rerunning jobs, type
xpbs -admin
Note that you must be identified as a PBS operator or manager in order for the additional "-admin" functions to take effect.
From this main xpbs window, you can create and submit jobs, monitor jobs, queues, and servers, as well as perform any of the actions that the command line interface permits you to do.
The optional Web-based user interface provides access to all the functionality of xpbs via almost any Web browser. To access it, you simply type the URL of your PBS Server host into your browser. The layout and usage are similar to those of xpbs.
Part of the PBS package is the PBS Interface Library, or IFL. This library provides a means of building new PBS clients. Any PBS service request can be invoked through calls to the interface library. Users may wish to build a PBS job that will check its status itself or submit new jobs, or they may wish to customize the job status display rather than use the qstat command. Administrators may use the interface library to build new control commands.
The IFL provides a user-callable function that corresponds to each PBS client command. There is (approximately) a one-to-one correlation between commands and PBS service requests. Additional routines are provided for network connection management. The user-callable routines are declared in the header file 'PBS_ifl.h'. Users request service of a batch server by calling the appropriate library routine and passing it the required parameters. The parameters correspond to the options and operands on the commands. The user must ensure that the parameters are in the correct syntax. Each function will return zero upon success and a nonzero error code on failure. These error codes are available in the header file 'PBS_error.h'. The library routine will accept the parameters and build the corresponding batch request. This request is then passed to the server communication routine. (The PBS API is fully documented in the PBS External Reference Specification.)