17.4 Configuring PBS

17.4 Configuring PBS

Now that PBS has been installed, the Server and MOMs can be configured and the scheduling policy selected. Note that further configuration of PBS may not be required since PBS Pro comes preconfigured, and the default configuration may completely meet your needs. However, you are advised to read this section to determine whether the defaults are indeed complete for you or whether any of the optional settings may apply.

17.4.1 Network Addresses and PBS

PBS makes use of fully qualified host names for identifying the jobs and their location. A PBS installation is known by the host name on which the Server is running. The name used by the daemons or used to authenticate messages is the canonical host name. This name is taken from the primary name field, h_name, in the structure returned by the library call gethostbyaddr(). According to the IETF RFCs, this name must be fully qualified and consistent for any IP address assigned to that host.

17.4.2 The Qmgr Command

The PBS manager command, qmgr, provides a command-line administrator interface. The command reads directives from standard input. The syntax of each directive is checked and the appropriate request sent to the Server(s). A qmgr directive takes one of the following forms:

    command server [names] [attr OP value[,...]]
    command queue  [names] [attr OP value[,...]]
    command node   [names] [attr OP value[,...]]

where command is the command to perform on an object. The qmgr commands are listed in Table 17.4.

Table 17.4: qmgr commands.

Command

Explanation


active

Set the active objects.

create

Create a new object, applies to queues and nodes.

delete

Destroy an existing object (queues or nodes).

set

Define or alter attribute values of the object.

unset

Clear the value of the attributes of the object.

list

List the current attributes and values of the object.

print

Print all the queue and server attributes.

The list or print subcommands of qmgr can be executed by the general user. Creating or deleting a queue requires PBS Manager privilege. Setting or unsetting server or queue attributes requires PBS Operator or Manager privilege.

Here are several examples that illustrate using the qmgr command. These and other qmgr commands are fully explained below, along with the specific tasks they accomplish.

        % qmgr
        Qmgr: create node mars np=2,ntype=cluster
        Qmgr: create node venus properties="inner,moonless"
        Qmgr: set node mars properties = inner
        Qmgr: set node mars properties += haslife
        Qmgr: delete node mars
        Qmgr: d n venus

Commands can be abbreviated to their minimum unambiguous form (as shown in the last line in the example above). A command is terminated by a new line character or a semicolon. Multiple commands may be entered on a single line. A command may extend across lines by marking the new line character with a backslash. Comments begin with a hash sign ("#") and continue to the end of the line. Comments and blank lines are ignored by qmgr. See the qmgr section of the PBS Administrator Guide for detailed usage and syntax description.

17.4.3 Nodes

Where jobs will be run is determined by an interaction between the Scheduler and the Server. This interaction is affected by the contents of the PBS 'nodes' file and the system configuration onto which you are deploying PBS. Without this list of nodes, the Server will not establish a communication stream with the MOM(s), and MOM will be unable to report information about running jobs or to notify the Server when jobs complete. In a cluster configuration, distributing jobs across the various hosts is a matter of the Scheduler determining on which host to place a selected job.

Regardless of the type of execution nodes, each node must be defined to the Server in the PBS nodes file, (the default location of which is '/usr/spool/PBS/server_priv/nodes'). This is a simple text file with the specification of a single node per line in the file. The format of each line in the file is

        node_name[:ts] [attributes]

The node name is the network name of the node (host name), it does not have to be fully qualified (in fact, it is best kept as short as possible). The optional ":ts" appended to the name indicates that the node is a timeshared node (i.e. a nodes on which multiple jobs may be run if the required resources are available).

Nodes can have attributes associated with them. Attributes come in three types: properties, name=value pairs, and name.resource=value pairs. Zero or more properties may be specified. The property is nothing more than a string of alphanumeric characters (first character must be alphabetic) without meaning to PBS. Properties are used to group classes of nodes for allocation to a series of jobs.

Any legal node name=value pair may be specified in the node file in the same format as on a qsub directive: attribute.resource=value. Consider the following example:

        NodeA resource_available.ncpus=3 max_running=1

The expression np=N may be used as shorthand for the expression

    resources_available.ncpus=N

which can be added to declare the number of virtual processors (VPs) on the node. This syntax specifies a numeric string, for example, np=4. This expression will allow the node to be allocated up to N times to one job or more than one job. If np=N is not specified for a cluster node, it is assumed to have one VP.

You may edit the nodes list in one of two ways. If the server is not running, you may directly edit the nodes file with a text editor. If the server is running, you should use qmgr to edit the list of nodes.

Each item on the line must be separated by white space. The items may be listed in any order except that the host name must always be first. Comment lines may be included if the first nonwhite space character is the hash sign ("#").

The following is an example of a possible nodes file for a cluster called "planets":

        # The first set of nodes are cluster nodes.
        # Note that the properties are provided to
        # logically group certain nodes together.
        # The last node is a timeshared node.
        #
        mercury    inner moonless
        venus      inner moonless np=1
        earth      inner np=1
        mars       inner np=2
        jupiter    outer np=18
        saturn     outer np=16
        uranus     outer np=14
        neptune    outer np=12
        pluto:ts

17.4.4 Creating or Adding Nodes

After pbs_server is started, the node list may be entered or altered via the qmgr command:

        create node node_name [attribute=value]

where the attributes and their associated possible values are shown in Table 17.5.

Table 17.5: PBS node attributes.

Attribute

Value


state

free, down, offline

properties

any alphanumeric string

ntype

cluster, time-shared

resources_available.ncpus (np)

number of virtual processors > 0

resources_available

list of resources available on node

resources_assigned

list of resources in use on node

max_running

maximum number of running jobs

max_user_run

maximum number of running jobs per user

max_group_run

maximum number of running jobs per group

queue

queue name (if any) associated with node

reservations

list of reservations pending on the node

comment

general comment

Below are several examples of setting node attributes via qmgr:

        % qmgr
        Qmgr: create node mars np=2,ntype=cluster
        Qmgr: create node venus properties="inner,moonless"

Once a node has been created, its attributes and/or properties can be modified by using the following qmgr syntax:

        set node node_name [attribute[+|-]=value]

where attributes are the same as for create, for example,

        % qmgr
        Qmgr: set node mars properties=inner
        Qmgr: set node mars properties+=haslife

Nodes can be deleted via qmgr as well, using the delete node syntax, as the following example shows:

        % qmgr
        Qmgr: delete node mars
        Qmgr: delete node pluto

Note that the busy state is set by the execution daemon, pbs_mom, when a load-average threshold is reached on the node. See max_load in MOM's config file. The job-exclusive and job-sharing states are set when jobs are running on the node.

17.4.5 Default Configuration

Server management consist of configuring the Server and establishing queues and their attributes. The default configuration, shown below, sets the minimum server settings and some recommended settings for a typical PBS cluster.

        % qmgr
        Qmgr: print server
        # Create queues and set their attributes
        #
        # Create and define queue workq
        #
        create queue workq
        set queue workq queue_type = Execution
        set queue workq enabled = True
        set queue workq started = True
        #
        # Set Server attributes
        #
        set server scheduling = True
        set server default_queue = workq
        set server log_events = 511
        set server mail_from = adm
        set server query_other_jobs = True
        set server scheduler_iteration = 600

17.4.6 Configuring MOM

The execution server daemons, MOMs, require much less configuration than does the Server. The installation process creates a basic MOM configuration file that contains the minimum entries necessary in order to run PBS jobs. This section describes the MOM configuration file and explains all the options available to customize the PBS installation to your site.

The behavior of MOM is controlled via a configuration file that is read upon daemon initialization (startup) and upon reinitialization (when pbs_mom receives a SIGHUP signal). The configuration file provides several types of runtime information to MOM: access control, static resource names and values, external resources provided by a program to be run on request via a shell escape, and values to pass to internal functions at initialization (and reinitialization). Each configuration entry is on a single line, with the component parts separated by white space. If the line starts with a hash sign ("#"), the line is considered to be a comment and is ignored.

A minimal MOM configuration file should contain the following:

        $logevent 0x1ff
        $clienthost server-hostname

The first entry, $logevent, specifies the level of message logging this daemon should perform. The second entry, $clienthost, identifies a host that is permitted to connect to this MOM. You should set the server-hostname variable to the name of the host on which you will be running the PBS Server (pbs_server). Advanced MOM configuration options are described in the PBS Administrator Guide.

17.4.7 Scheduler Configuration

Now that the Server and MOMs have been configured, we turn our attention to the PBS Scheduler. As mentioned previously, the Scheduler is responsible for implementing the local site policy regarding which jobs are run and on what resources. This section discusses the recommended configuration for a typical cluster. The full list of tunable Scheduler parameters and detailed explanation of each is provided in the PBS Administrator Guide.

The PBS Pro Scheduler provides a wide range of scheduling policies. It provides the ability to sort the jobs in dozens of different ways, including FIFO order. It also can sort on user and group priority. The queues are sorted by queue priority to determine the order in which they are to be considered. As distributed, the Scheduler is configured with the defaults shown in Table 17.6.

Table 17.6: Default scheduling policy parameters.

Option

Default Value


round_robin

False

by_queue

True

strict_fifo

False

load_balancing

False

load_balancing_rr

False

fair_share

False

help_starving_jobs

True

backfill

True

backfill_prime

False

sort_queues

True

sort_by

shortest_job_first

smp_cluster_dist

pack

preemptive_sched

True

Once the Server and Scheduler are configured and running, job scheduling can be initiated by setting the Server attribute scheduling to a value of true:

        # qmgr -c "set server scheduling=true"

The value of scheduling is retained across Server terminations or starts. After the Server is configured, it may be placed into service.




Part III: Managing Clusters