17.2 Using PBS

From the user's perspective, a workload mangement system enables you to make more efficient use of your time by allowing you to specify the tasks you need run on the cluster. The system takes care of running these tasks and returning the results to you. If the cluster is full, then it holds your tasks and runs them when the resources are available.

PBS provides two user interfaces: a command-line interface (CLI) and a graphical user interface (GUI). You can use either to interact with PBS: both interfaces have the same functionality. (The examples below show the command line interface; see the "Using the PBS Graphical User Interface" section below for examples of the GUI.)

Using either interface, you create a batch job that you then submit to PBS. A batch job is a shell script containing the set of commands you want run on the cluster. It also contains directives that specify the resource requirements (such as memory or CPU time) that your job needs. Once you create your PBS job, you can reuse it, if you wish, or you can modify it for subsequent runs. Example job scripts are shown below.

PBS also provides a special kind of batch job called interactive batch. This job is treated just like a regular batch job (it is queued up and must wait for resources to become available before it can run). But once it is started, the user's terminal input and output are connected to the job in what appears to be an rlogin session. It appears that the user is logged into one of the nodes of the cluster, and the resources requested by the job are reserved for that job. Many users find this feature useful for debugging their applications or for computational steering.

17.2.1 Creating a PBS Job

Previously we mentioned that a PBS job is simply a shell script containing resource requirements of the job and the command(s) to be executed. (However, if you use the PBS graphical interface, you do not have to edit any batch files; instead, the GUI provides a point and click interface that creates the batch job script for you.) A sample PBS job might look like the following:

        #!/bin/sh
        #PBS -1 walltime=1:00:00
        #PBS -1 nodes=4
        #PBS -j oe

        cd ${HOME}/PBS/trial
        mpiexec -n 4 myprogram

This script would then be submitted to PBS using the qsub command.

Let us look at the script for a moment. The first line tells what shell to use to interpret the script. Lines 2-3 are resource directives, specifying arguments to the "resource list" ("-1") option of qsub. Note that all PBS directives begin with #PBS. These lines tell PBS what to do with your job. Any qsub option can also be placed inside the script by using a #PBS directive. However, PBS stops parsing directives with the first blank line encountered.

Returning to our example above, we see a request for one hour of wall-clock time and four nodes. The fourth line is a request for PBS to merge the stdout and stderr file streams of the job into a single file. The last two lines are the commands the user wants executed: change directory to a particular location, then execute an MPI program called 'myprogram'.

This job script could have been created in one of two ways: using a text editor, or using the xpbs graphical interface (see below).

17.2.2 Submitting a PBS Job

The command used to submit a job to PBS is qsub. For example, say you created a file containing your PBS job called 'myscriptfile'. The following example shows how to submit the job to PBS:

        % qsub myscriptfile
        12322.sol.pbspro.com

The second line in the example is the job identifier returned by the PBS Server. This unique identifier can be used to act on this job in the future (before it completes running). The next section of this chapter discusses using this "job id" in various ways.

The qsub command has a number of options that can be specified either on the command-line or in the job script itself. Note that any command-line option will override the same option within the script file.

Table 17.1 lists the most commonly used options to qsub. See the PBS User Guide for the complete list and full description of the options.

Table 17.1: Qsub options.
Option	Purpose

-1 list	List of resources needed by job
-q queue	Queue to submit job to
-N name	Name of job
-S shell	Shell to execute job script
-p priority	Priority of job relative to your jobs
-a datetime	Delay job under after datetime
-j oe	Join output and error files
-h	Place a hold on job

The "-l resource_list" option is used to specify the resources needed by the job. Table 17.2 lists all the resources available to jobs running on clusters.

Table 17.2: PBS resources.
Resource	Meaning

arch	System architecture needed by job
cput	CPU time required by all processes in job
file	Maximum single file disk space requirements
mem	Total amount of RAM memory required
ncpus	Number of CPUs (processors) required
nice	Requested "nice" (Unix priority) value
nodes	Number and/or type of nodes needed
pcput	Maximum per-process CPU time required
pmem	Maximum per-process memory required
wall time	Total wall-clock time needed
workingset	Total disk space requirements

17.2.3 Getting the Status of a PBS Job

Once the job has been submitted to PBS, you can use either the qstat or xpbs commands to check the job status. If you know the job identifier for your job, you can request the status explicitly. Note that unless you have multiple clusters, you need only specify the sequence number portion of the job identifier:

    % qstat 12322
    Job id        Name         User   Time Use S Queue
    ------------- ------------ ------ -------- - -----
    12322.sol     myscriptfile jjones 00:06:39 R submit

If you run the qstat command without specifing a job identifier, then you will receive status on all jobs currently queued and running.

Often users wonder why their job is not running. You can query this information from PBS using the "-s" (status) option of qstat, for example,

    % qstat -s 12323
    Job id        Name         User   Time Use S Queue
    ------------- ------------ ------ -------- - -----
    12323.sol     myscriptfile jjones 00:00:00 Q submit
      Requested number of CPUs not currently available.

A number of options to qstat change what information is displayed. The PBS User Guide gives the complete list.

17.2.4 PBS Command Summary

So far we have seen several of the PBS user commands. Table 17.3 is provided as a quick reference for all the PBS user commands. Details on each can be found in the PBS manual pages and the PBS User Guide.

Table 17.3: PBS user commands.
Command	Purpose

qalter	Alter job(s)
qdel	Delete job(s)
qhold	Hold job(s)
qmsg	Send a message to job(s)
qmove	Move job(s) to another queue
qrls	Release held job(s)
qrerun	Rerun job(s)
qselect	Select a specific subset of jobs
qsig	Send a signal to job(s)
qstat	Show status of job(s)
qsub	Submit job(s)
xpbs	Graphical Interface (GUI) to PBS commands

17.2.5 Using the PBS Graphical User Interface

PBS provides two GUI interfaces: a TCL/TK-based GUI called xpbs and an optional Web-based GUI.

The GUI xpbs provides a user-friendly point-and-click interface to the PBS commands. To run xpbs as a regular, nonprivileged user, type

        setenv DISPLAY your_workstation_name:0
        xpbs

To run xpbs with the additional purpose of terminating PBS Servers, stopping and starting queues, or running or rerunning jobs, type

        xpbs -admin

Note that you must be identified as a PBS operator or manager in order for the additional "-admin" functions to take effect.

From this main xpbs window, you can create and submit jobs, monitor jobs, queues, and servers, as well as perform any of the actions that the command line interface permits you to do.

The optional Web-based user interface provides access to all the functionality of xpbs via almost any Web browser. To access it, you simply type the URL of your PBS Server host into your browser. The layout and usage are similar to those of xpbs.

17.2.6 PBS Application Programming Interface

Part of the PBS package is the PBS Interface Library, or IFL. This library provides a means of building new PBS clients. Any PBS service request can be invoked through calls to the interface library. Users may wish to build a PBS job that will check its status itself or submit new jobs, or they may wish to customize the job status display rather than use the qstat command. Administrators may use the interface library to build new control commands.

The IFL provides a user-callable function that corresponds to each PBS client command. There is (approximately) a one-to-one correlation between commands and PBS service requests. Additional routines are provided for network connection management. The user-callable routines are declared in the header file 'PBS_ifl.h'. Users request service of a batch server by calling the appropriate library routine and passing it the required parameters. The parameters correspond to the options and operands on the commands. The user must ensure that the parameters are in the correct syntax. Each function will return zero upon success and a nonzero error code on failure. These error codes are available in the header file 'PBS_error.h'. The library routine will accept the parameters and build the corresponding batch request. This request is then passed to the server communication routine. (The PBS API is fully documented in the PBS External Reference Specification.)