CustomerB was in the midst of implementing a new application, performing stress tests before going into production. While performing the stress test, the database server could not ramp up to the desired number of client applications while achieving the required response times.
The three parameters in the database manager configuration that stand out are highlighted in bold below. In this case, the application is mainly an online application; therefore, intra-partition parallelism should be disabled. Because there will be a large number of concurrent applications, the sort heap threshold (SHEAPTHRES) will need to be high; however, the existing value is sufficient.
Node type = Database Server with local and remote clients Database manager configuration release level = 0x0900 CPU speed (millisec/instruction) (CPUSPEED) = 8.580921e-07 Max number of concurrently active databases (NUMDB) = 8 Data Links support (DATALINKS) = NO Federated Database System Support (FEDERATED) = YES Transaction processor monitor name (TP_MON_NAME) = Default charge-back account (DFT_ACCOUNT_STR) = Java Development Kit 1.1 installation path (JDK_PATH) = Diagnostic error capture level (DIAGLEVEL) = 3 Notify Level (NOTIFYLEVEL) = 3 Diagnostic data directory path (DIAGPATH) = /products/db2/fsprdi/sqllib/db2dump Default database monitor switches Buffer pool (DFT_MON_BUFPOOL) = OFF Lock (DFT_MON_LOCK) = OFF Sort (DFT_MON_SORT) = OFF Statement (DFT_MON_STMT) = OFF Table (DFT_MON_TABLE) = OFF Unit of work (DFT_MON_UOW) = OFF Monitor health of instance and databases (HEALTH_MON) = OFF SYSADM group name (SYSADM_GROUP) = DBA SYSCTRL group name (SYSCTRL_GROUP) = DBASCTL SYSMAINT group name (SYSMAINT_GROUP) = DBAMNT Database manager authentication (AUTHENTICATION) = SERVER Cataloging allowed without authority (CATALOG_NOAUTH) = NO Trust all clients (TRUST_ALLCLNTS) = YES Trusted client authentication (TRUST_CLNTAUTH) = CLIENT Use SNA authentication (USE_SNA_AUTH) = NO Bypass federated authentication (FED_NOAUTH) = NO Default database path (DFTDBPATH) = /products/db2/fsprdi Database monitor heap size (4KB) (MON_HEAP_SZ) = 56 UDF shared memory set size (4KB) (UDF_MEM_SZ) = 256 Java Virtual Machine heap size (4KB) (JAVA_HEAP_SZ) = 2048 Audit buffer size (4KB) (AUDIT_BUF_SZ) = 0 Backup buffer default size (4KB) (BACKBUFSZ) = 1024 Restore buffer default size (4KB) (RESTBUFSZ) = 1024 Sort heap threshold (4KB) (SHEAPTHRES) = 100000 Directory cache support (DIR_CACHE) = YES Application support layer heap size (4KB) (ASLHEAPSZ) = 15 Max requester I/O block size (bytes) (RQRIOBLK) = 32767 Query heap size (4KB) (QUERY_HEAP_SZ) = 16000 DRDA services heap size (4KB) (DRDA_HEAP_SZ) = 128 Priority of agents (AGENTPRI) = SYSTEM Max number of existing agents (MAXAGENTS) = 200 Agent pool size (NUM_POOLAGENTS) = 4 (calculated) Initial number of agents in pool (NUM_INITAGENTS) = 0 Max number of coordinating agents (MAX_COORDAGENTS) = MAXAGENTS Max no. of concurrent coordinating agents (MAXCAGENTS) = MAX_COORDAGENTS Max number of logical agents (MAX_LOGICAGENTS) = MAX_COORDAGENTS Keep fenced process (KEEPFENCED) = YES Number of pooled fenced processes (FENCED_POOL) = MAX_COORDAGENTS Initialize DARI process with JVM (INITFENCED_JVM) = NO Initial number of fenced DARI process (NUM_INITFENCED) = 0 Index re-creation time (INDEXREC) = RESTART Transaction manager database name (TM_DATABASE) = 1ST_CONN Transaction resync interval (sec) (RESYNC_INTERVAL) = 180 SPM name (SPM_NAME) = it_ibm60 SPM log size (SPM_LOG_FILE_SZ) = 256 SPM resync agent limit (SPM_MAX_RESYNC) = 20 SPM log path (SPM_LOG_PATH) = TCP/IP Service name (SVCENAME) = 50000 Discovery mode (DISCOVER) = SEARCH Discovery communication protocols (DISCOVER_COMM) = TCPIP Discover server instance (DISCOVER_INST) = ENABLE Maximum query degree of parallelism (MAX_QUERYDEGREE) = ANY Enable intra-partition parallelism (INTRA_PARALLEL) = NO No. of int. communication buffers(4KB)(FCM_NUM_BUFFERS) = 1024 Node connection elapse time (sec) (CONN_ELAPSE) = 10 Max number of node connection retries (MAX_CONNRETRIES) = 5 Max time difference between nodes (min) (MAX_TIME_DIFF) = 60 db2start/db2stop timeout (min) (START_STOP_TIME) = 10
The three parameters in the database manager configuration that stand out are highlighted in bold below.
Because the applications will be performing a lot of inserts/updates/deletes, the log buffer size (LOGBUFSZ) will be important. However, the log buffer is sized adequately for this workload.
Because this is an online application, the sort list heap (SORTHEAP) should not be too large. When it is too large, the DB2 optimizer will tend to favor sorts over index scans, and with a large number of concurrent applications, sorting will be detrimental to performance.
In this case, the default buffer pool size (BUFFPAGE) is 1,000 4-KB pages. To determine the real size of the buffer pool, use the following statement:
select * from syscat.bufferpools
In this case, there is only one buffer pool (IBMDEFAULTBP), and it has a size of 5,000 4-KB pages.
This is a rather small buffer pool; therefore, increasing the size of the buffer pool likely will help performance.
Database configuration release level = 0x0a00 Database release level = 0x0a00 Database territory = US Database code page = 819 Database code set = ISO8859-1 Database country/region code = 1 Dynamic SQL Query management (DYN_QUERY_MGMT) = DISABLE Discovery support for this database (DISCOVER_DB) = ENABLE Default query optimization class (DFT_QUERYOPT) = 5 Degree of parallelism (DFT_DEGREE) = 1 Continue upon arithmetic exceptions (DFT_SQLMATHWARN) = NO Default refresh age (DFT_REFRESH_AGE) = 0 Number of frequent values retained (NUM_FREQVALUES) = 10 Number of quantiles retained (NUM_QUANTILES) = 20 Backup pending = NO Database is consistent = NO Rollforward pending = NO Restore pending = NO Multi-page file allocation enabled = NO Log retain for recovery status = NO User exit for logging status = NO Data Links Token Expiry Interval (sec) (DL_EXPINT) = 60 Data Links Write Token Init Expiry Intvl(DL_WT_IEXPINT) = 60 Data Links Number of Copies (DL_NUM_COPIES) = 1 Data Links Time after Drop (days) (DL_TIME_DROP) = 1 Data Links Token in Uppercase (DL_UPPER) = NO Data Links Token Algorithm (DL_TOKEN) = MAC0 Database heap (4KB) (DBHEAP) = 1200 Size of database shared memory (4KB) (DATABASE_MEMORY) = AUTOMATIC Catalog cache size (4KB) (CATALOGCACHE_SZ) = (MAXAPPLS*4) Log buffer size (4KB) (LOGBUFSZ) = 256 Utilities heap size (4KB) (UTIL_HEAP_SZ) = 5000 Buffer pool size (pages) (BUFFPAGE) = 1000 Extended storage segments size (4KB) (ESTORE_SEG_SZ) = 16000 Number of extended storage segments (NUM_ESTORE_SEGS) = 0 Max storage for lock list (4KB) (LOCKLIST) = 100 Max size of appl. group mem set (4KB) (APPGROUP_MEM_SZ) = 20000 Percent of mem for appl. group heap (GROUPHEAP_RATIO) = 70 Max appl. control heap size (4KB) (APP_CTL_HEAP_SZ) = 128 Sort heap thres for shared sorts (4KB) (SHEAPTHRES_SHR) = (SHEAPTHRES) Sort list heap (4KB) (SORTHEAP) = 1024 SQL statement heap (4KB) (STMTHEAP) = 4096 Default application heap (4KB) (APPLHEAPSZ) = 256 Package cache size (4KB) (PCKCACHESZ) = (MAXAPPLS*8) Statistics heap size (4KB) (STAT_HEAP_SZ) = 4384 Interval for checking deadlock (ms) (DLCHKTIME) = 10000 Percent. of lock lists per application (MAXLOCKS) = 10 Lock timeout (sec) (LOCKTIMEOUT) = -1 Changed pages threshold (CHNGPGS_THRESH) = 60 Number of asynchronous page cleaners (NUM_IOCLEANERS) = 10 Number of I/O servers (NUM_IOSERVERS) = 10 Index sort flag (INDEXSORT) = YES Sequential detect flag (SEQDETECT) = YES Default prefetch size (pages) (DFT_PREFETCH_SZ) = 32 Track modified pages (TRACKMOD) = OFF Default number of containers = 1 Default tablespace extentsize (pages) (DFT_EXTENT_SZ) = 32 Max number of active applications (MAXAPPLS) = AUTOMATIC Average number of active applications (AVG_APPLS) = 1 Max DB files open per application (MAXFILOP) = 64 Log file size (4KB) (LOGFILSIZ) = 1000 Number of primary log files (LOGPRIMARY) = 3 Number of secondary log files (LOGSECOND) = 2 Changed path to log files (NEWLOGPATH) = Path to log files = /databases/sample/logs Overflow log path (OVERFLOWLOGPATH) = Mirror log path (MIRRORLOGPATH) = First active log file = Block log on disk full (BLK_LOG_DSK_FUL) = NO Percent of max active log space by transaction(MAX_LOG) = 0 Num. of active log files for 1 active UOW(NUM_LOG_SPAN) = 0 Group commit count (MINCOMMIT) = 1 Percent log file reclaimed before soft chckpt (SOFTMAX) = 100 Log retain for recovery enabled (LOGRETAIN) = OFF User exit for logging enabled (USEREXIT) = OFF Auto restart enabled (AUTORESTART) = ON Index re-creation time (INDEXREC) = SYSTEM (RESTART) Default number of loadrec sessions (DFT_LOADREC_SES) = 1 Number of database backups to retain (NUM_DB_BACKUPS) = 12 Recovery history retention (days) (REC_HIS_RETENTN) = 366 TSM management class (TSM_MGMTCLASS) = TSM node name (TSM_NODENAME) = TSM owner (TSM_OWNER) = TSM password (TSM_PASSWORD) =
The size of the buffer pool was increased as follows:
alter bufferpool IBMDEFAULTBP immediate size 500000
After increasing the size of the buffer pool, the response time improved, and more concurrent users were able to connect to the database and run the applications; however, the system was still unable to attain the required number of applications.
This requires operating system tools such as vmstat, iostat, and/or top to capture the memory, I/O, and process level information. It is important to capture the snapshots from these tools over a period of time, not taking just one snapshot, especially because the first line of the output of the vmstat and iostat tools contains average information since the server was started and is not really useful in analyzing a problem.
While the applications are running, and particularly when the system begins to stop responding, capture iostat output, as follows:
iostat 5 > iostat.out
Next, examine the iostat.out file, below:
Disks: % tm_act Kbps tps Kb_read Kb_wrtn Hdisk0 2.6 13.2 7.3 19 0 Hdisk1 1.8 9.2 7.3 19 0 Hdisk2 1.8 9.2 7.3 19 0 Hdisk3 1.8 9.2 7.3 19 0 Hdisk4 1.8 9.2 7.3 19 0 Hdisk5 42.8 213.2 107.3 237 286 Hdisk6 1.8 9.2 7.3 19 0 Hdisk7 1.8 9.2 7.3 19 0 Hdisk8 1.8 9.2 7.3 19 0 Hdisk9 1.8 9.2 7.3 19 0 cd0 0.0 0.0 0.0 0 0 Disks: % tm_act Kbps tps Kb_read Kb_wrtn Hdisk0 2.6 13.2 7.3 19 0 Hdisk1 1.8 9.2 7.3 19 0 Hdisk2 1.8 9.2 7.3 19 0 Hdisk3 1.8 9.2 7.3 19 0 Hdisk4 1.8 9.2 7.3 19 0 Hdisk5 47.1 234.7 111.3 261 292 Hdisk6 1.8 9.2 7.3 19 0 Hdisk7 1.8 9.2 7.3 19 0 Hdisk8 1.8 9.2 7.3 19 0 Hdisk9 1.8 9.2 7.3 19 0 cd0 0.0 0.0 0.0 0 0 Disks: % tm_act Kbps tps Kb_read Kb_wrtn Hdisk0 2.6 13.2 7.3 19 0 Hdisk1 1.8 9.2 7.3 19 0 Hdisk2 1.8 9.2 7.3 19 0 Hdisk3 1.8 9.2 7.3 19 0 Hdisk4 1.8 9.2 7.3 19 0 Hdisk5 43.3 214.6 109.8 283 239 Hdisk6 1.8 9.2 7.3 19 0 Hdisk7 1.8 9.2 7.3 19 0 Hdisk8 1.8 9.2 7.3 19 0 Hdisk9 1.8 9.2 7.3 19 0 cd0 0.0 0.0 0.0 0 0
The iostat output shows that one disk is much busier than all of the other disks; therefore, it is important to understand what part of the database is physically stored on Hdisk5.
In this case, the disks are not striped volumes, they are just a bunch of disks (JBOD); therefore, use the operation system tools to determine what file system was created on Hdisk5, then analyze the database to determine what part of the database is stored on that file system.
The operating system tools indicate that the file system /tablespaces/sample/temp was created on Hdisk5. Based on the name of the file system, it appears to be where the temporary table space was created. This can be verified as follows:
Determine the table space ID for the temporary table space:
select tbspaceid, tbspace from syscat.tablespaces where datatype='T'
The output of the statement above is:
TBSPACEID TBSPACE ----------- ----------- 1 TEMPSPACE1 1 record(s) selected.
Determine the container definitions for the temporary table space as follows:
list tablespace containers for 1
The output of the statement above is:
Tablespace Containers for Tablespace 1 Container ID = 0 Name = /tablespaces/sample/temp Type = Path
Therefore, the disk with the most activity is the disk where the temporary table space has been placed.
Based on the iostat output, there are 10 physical disks in the server. To eliminate the I/O bottleneck and spread the I/O across as many disks as possible, the physical design of the database needed to be changed. Instead of isolating each of the table spaces to its own physical disk, each table space would be created with eight containers, with one container on each of the disks Hdisk2 through Hdisk9. In this case, because it was an online system, one disk was set aside for the database logs and another for the operating system paging space.
To change the physical layout of the database, an offline backup was taken, the database was dropped, and a redirected restore was performed. During the redirected restore, the table space definitions were changed.
After changing the table space definitions, the response time improved even more, and the desired number of concurrent applications were able to run successfully.
While the applications are running, capture iostat output as follows:
iostat 5 > iostat.out
Next examine the iostat.out file, below:
Disks: % tm_act Kbps tps Kb_read Kb_wrtn Hdisk0 2.6 13.2 7.3 19 0 Hdisk1 4.8 12.2 17.3 0 78 Hdisk2 12.1 29.2 27.2 109 134 Hdisk3 12.5 29.4 27.1 127 125 Hdisk4 12.2 29.1 27.5 113 130 Hdisk5 12.3 29.3 27.4 116 124 Hdisk6 12.4 29.3 27.4 124 126 Hdisk7 12.3 29.1 27.2 103 131 Hdisk8 12.4 29.4 27.3 108 132 Hdisk9 12.2 29.1 27.1 110 130 cd0 0.0 0.0 0.0 0 0
The iostat output now shows that the I/O is spread evenly across the data disks, and none of the disks is busy more than 15% of the time. This leaves room for the I/O subsystem to handle additional requests if needed.