Improving remote query performance by tuning FET_BUF_SIZE

I thought I’d write blog post as a nice example of where tuning the client-side variable, FET_BUF_SIZE, really speeded up a remote query.

FET_BUF_SIZE is documented by IBM in the context of a Java application using JDBC here and as a server environment variable here.

One thing the documentation warns about is that simply setting this to a high value may degrade performance, especially if you have a lot of connections. With that in mind here are some facts about the query I’m running and using as a basis for these tests:

  • I am just using a single connection to the database.
  • the query returns around 10000 rows and 60 Mb of data.
  • the client and the server are geographically separated from each other and Art Kagel’s dbping utility typically takes around 0.1 seconds to connect remotely; this compares with around 3 milliseconds locally.
  • crucially the query runs in seconds locally on the server but takes over three minutes when run remotely.

If I begin running the query with the default value of FET_BUF_SIZE and monitor waits on the server, I can see that reads only go up slowly and that my session is waiting on a condition (indicated by the Y in position one of column two) more or less all the time:

> while [ 1 ] ; do
> onstat -u | grep thompson
> sleep 1
> done
Userthreads
address flags sessid user tty wait tout locks nreads nwrites
26eb492d18 Y--P-R- 76228 thompson 0 26e67cd298 0 0 552 0
26eb492d18 Y--P-R- 76228 thompson 0 26e67cd298 0 0 552 0
26eb492d18 Y--P-R- 76228 thompson 0 26e67cd298 0 0 560 0
26eb492d18 Y--P-R- 76228 thompson 0 26e67cd298 0 0 560 0
26eb492d18 Y--P-R- 76228 thompson 0 26e67cd298 0 0 568 0
26eb492d18 Y--P-R- 76228 thompson 0 26e67cd298 0 0 576 0
26eb492d18 Y--P-R- 76228 thompson 0 26e67cd298 0 0 592 0
26eb492d18 Y--P-R- 76228 thompson 0 26e67cd298 0 0 624 0
26eb492d18 Y--P-R- 76228 thompson 0 26e67cd298 0 0 624 0

The sixth column shows the rstcb value of the thread I’m waiting on. I can use onstat -g con (print conditions with waiters) to see that I’m waiting on the network:

> onstat -g con | grep -E '^cid|26e67cd298'
cid addr name waiter waittime
5789 26e67cd298 netnorm 84353 0

A quick check with onstat -g ses 76228 shows that thread id. 84353 does indeed correspond to my session.

While the wait time shown above is not increasing it’s a different story when we look at netstat, again on the server:

> netstat -nc | grep '172.16.0.1'
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 1312 10.0.0.1:9088 172.16.0.1:37004 ESTABLISHED
tcp 0 1284 10.0.0.1:9088 172.16.0.1:37004 ESTABLISHED
tcp 0 1306 10.0.0.1:9088 172.16.0.1:37004 ESTABLISHED
tcp 0 1302 10.0.0.1:9088 172.16.0.1:37004 ESTABLISHED
tcp 0 1194 10.0.0.1:9088 172.16.0.1:37004 ESTABLISHED
tcp 0 1206 10.0.0.1:9088 172.16.0.1:37004 ESTABLISHED
tcp 0 1266 10.0.0.1:9088 172.16.0.1:37004 ESTABLISHED
tcp 0 1304 10.0.0.1:9088 172.16.0.1:37004 ESTABLISHED
tcp 0 1318 10.0.0.1:9088 172.16.0.1:37004 ESTABLISHED
tcp 0 1248 10.0.0.1:9088 172.16.0.1:37004 ESTABLISHED

What the above is showing us is that there are consistently around 1200 to 1300 bytes in the send queue (Send-Q). This is surely our bottleneck.

At this point when investigating the problem I considered modifying other parameters such as OPTOFC and Linux kernel parameters. However with a few moment’s thought it was clear these weren’t going to gain anything: OPTOFC optimises the open-fetch-close sequence and for a single long running query this is not going to give us anything measurable; and an investigation into increasing the Linux kernel parameter related to the send queue size was dismissed when we found that 1300 bytes was well below the maximum allowed.

In Informix 11.50 the maximum value of FET_BUF_SIZE is 32767 (32 kb) but this is increased to 2147483648, or as we’ll see actually 2147483647, (2 Gb) in 11.70 and above. We can therefore move onto to experiment with different values:

FET_BUF_SIZE Query run time (s) Average Send-Q size over 10 samples Maximum Send-Q size observed
Default 221.2 1274 1332
1024 221.1 1255 1326
2048 221.1 1285 1338
4096 221.2 1297 1360
6144 102.1 2564 2676
8192 56.6 5031 5210
16384 22.6 12490 13054
32767 (max. 11.50 value) 11.5 24665 29968
65536 7.0 62188 62612
131072 4.9 115793 127826
262144 4.0 146686 237568
524288 3.5 184320 249856
1048576 3.3 245760 473616
2097152 3.2 249856 486352
2147483647 (max. value – 1) 3.0 245760 549352
2147483648 (supposed max. value) 221.3 1276 1366

As the run times get shorter it gets tricky to measure the Send-Q using netstat -nc: it can be sampled very frequently using a command like:

while [ 1 ] ; do
netstat -n | grep '172.16.0.1'
done

This will produce many measurements per second and with this it’s possible to see it fill up and drain several times in the period while the statement is running.

It’s also interesting to play around with the boundaries. For example, with a FET_BUF_SIZE between around 5500 and 5600 maximum Send-Q sizes the same as those consistently achieved with a FET_BUF_SIZE of 6144 begin to creep into the results but many measurements remain around the values consistently measured wit a FET_BUF_SIZE of 4096:

Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 1316 10.0.0.1:9088 172.16.0.1:37488 ESTABLISHED
tcp 0 1318 10.0.0.1:9088 172.16.0.1:37488 ESTABLISHED
tcp 0 1278 10.0.0.1:9088 172.16.0.1:37488 ESTABLISHED
tcp 0 1352 10.0.0.1:9088 172.16.0.1:37488 ESTABLISHED
tcp 0 1288 10.0.0.1:9088 172.16.0.1:37488 ESTABLISHED
tcp 0 2546 10.0.0.1:9088 172.16.0.1:37488 ESTABLISHED
tcp 0 1278 10.0.0.1:9088 172.16.0.1:37488 ESTABLISHED
tcp 0 2502 10.0.0.1:9088 172.16.0.1:37488 ESTABLISHED
tcp 0 1266 10.0.0.1:9088 172.16.0.1:37488 ESTABLISHED
tcp 0 1314 10.0.0.1:9088 172.16.0.1:37488 ESTABLISHED
tcp 0 2506 10.0.0.1:9088 172.16.0.1:37488 ESTABLISHED
tcp 0 1292 10.0.0.1:9088 172.16.0.1:37488 ESTABLISHED

So what are the conclusions?

  • Increasing FET_BUF_SIZE at the client side can dramatically improve the speed of remote queries.
  • Maximum Send-Q sizes, as measured by netstat, increase in discrete steps as FET_BUF_SIZE is increased.
  • A larger Send-Q allows more data to be cached and reduces waits seen in Informix.
  • To see any improvement at all FET_BUF_SIZE must be increased to at least 6000 (approximate value).
  • Around boundaries between maximum Send-Q sizes there appears to be a cross-over region where maximum send queue sizes overlap from two adjacent values are seen from one second to the next.
  • The maximum value allowed in 11.70 at least is 2147483647 and not 2147483648, as indicated in the documentation.
  • The maximum 11.50 value of 32767 produced a run time nearly 4x slower than an optimised value for 11.70+
  • Other testing I did, not documented here, shows that the results are uniform across JDBC and ESQL/C applications.

Note: all user names, IP addresses and port numbers used in this post have been altered.


Large parallel index builds and temp space

This is a quick post about parallel index builds. Today I was building with PDQPRIORITY a unfragmented detached index on a large table fragmented by range with ten large fragments and I saw this message in the online log:

10:28:53 WARNING: Not enough temp space for parallel index build.
Space required = 110566014 pages; space available = 8385216 pages.
Partial index build started.

You can see I am quite a long way short of the temp space required here; I need just over thirteen times more.

In this instance I have eight temporary dbspaces available and all are listed in the DBSPACETEMP onconfig parameter and I have no local override. They are all 2 Gb and using a 16 kb page size so have 131072 pages each and, as I am in single user mode, I know they are all free. onstat -d confirms that 131019 pages of 131072 are free in each of them. In case it’s relevant I also have 1,027,203 2 kb pages free in rootdbs.

The first thing that confuses me is the 8,385,216 pages the online log message says are available, which is more than I actually have. 131019 * 8 = 1048152. I think this is a bug as it’s a factor of 8 out. It’s probably assuming a 2 kb page size somewhere and my 16 kb dbspaces are a 8x multiple of this. I am using Linux so is Informix using native page size units and just not making it clear?

The index I am creating is on a bigint field and there are 7,076,224,823 rows. If I assume 110,566,014 pages actually means 210 Gb, the engine is calculating 32 bits/row or 4 bytes/row exactly which sounds right.

Anyway despite the message in the online log I am comforted by this IBM support article which tells me:

You do not have to take action. This is a warning. The database server will create the index one fragment at a time, instead of all at once.

However, it does advise me that cancelling the statement, adding more temp space and starting again would be a good idea. This is prescient as we’ll see.

Analysing this now it is probably going to fail somewhere because I need thirteen times more space but the engine can only divide the workload by working on a single fragment at a time. There are ten and they are not all exactly the same size. In fact my largest fragment has 1,950,612,068 rows, 27% of the total and based on 4 bytes/row the largest fragment I can handle would have only 536,653,818 rows. I suspect this means to succeed I will need at least 30,478,314 2 kb pages available to complete the build. I hope this all makes sense anyway!

Foolhardily and possibly because I get distracted by something I leave it to run. More messages appear in the log as the build progresses:

11:22:33 WARNING: Not enough temp space for parallel index build.
Space required = 110566014 pages; space available = 8385216 pages.
Partial index build started.
12:19:28 WARNING: Not enough temp space for parallel index build.
Space required = 110566014 pages; space available = 8385216 pages.
Partial index build started.
13:27:03 WARNING: Not enough temp space for parallel index build.
Space required = 110566014 pages; space available = 8385216 pages.
Partial index build started.
13:47:56 Session Insufficient space in temporary dbspaces:
Creating the temporary table in the root dbspace,
Temporary table size is 17632 pages.

Nearly four hours after it began at 14:27:41 my index build fails with:

212: Cannot add index.
179: ISAM error: no free disk space for sort

Harumph.

I guess there are two annoying things about this:

  1. The support article is only right if your largest fragment will not require more space than is available.
  2. The failure could have been foreseen at the beginning by looking at row counts.

Anyway, I hope this helps. If I get time I will do some more testing on this to confirm some of the assumptions I have made in writing this article. Feedback is welcome as ever (via https://informixdba.wordpress.com for those of you reading this on PlanetIDS)!


Experience with Auto Update Statistics (AUS)

Introduction

This is based on my previous blog post, Working with auto update statistics which I’ve expanded and significantly improved. I presented this at the Informix International User Group Conference 2015.

Let’s start at the beginning. Why do we run UPDATE STATISTICS at all? When we write an SQL query and send it to the database engine to execute, there may be several ways that the engine can run the query. For example if there are two tables the engine can start with the first table and join the second to it or begin with the second and join the first. There may be also two filter conditions, one of which may very specific and pick out only a few rows; the other may be very general. It should be apparent that some ways are more efficient than others, sometimes by several orders of magnitude.

Informix uses a cost-based optimizer to determine how to run queries. This relies on metadata to provide information about how large tables are, how many distinct values there are and other information about your data. We call these pieces of information statistics and if we also have a histogram of a column showing the abundance of specific values or ranges we call this a distribution.

The optimizer looks at possible query plans and chooses the one with the lowest costs assigned to it, according to the statistics and distributions. This may or may not be the best plan possible; if not you may see poor performance. Inaccurate cost predictions could be because your statistics are inadequate or out of date.

Maintaining the statistics and distributions is a key DBA responsibility.

What statistics and distributions should you maintain? The purpose is to ensure the optimizer selects the most efficient query plan for your queries.

These query plans should be stable over time. Normally stability is achieved through not changing things: this is why your Change Manager likes to say no (sometimes). However, with UPDATE STATISTICS stability comes from regularly refreshing your statistics and distributions: this is a change to your system you may be doing daily.

The Forth Bridge

A slide from my presentation to the IIUG 2015 Conference.

What statistics do you need? The Informix Performance Guide offers a general set of recommendations. However:

  • They may be more than you need and therefore more to maintain, which could be a headache with a large system.
  • In some specific cases they may not be good enough. The result of this can be an application codebase full of query directives (instructions to the optimizer to run queries in a particular way).
  • The guide doesn’t say much about how frequently you should run UPDATE STATISTICS.

Statistics and distributions

Statistics Distributions
  • Updated by UPDATE STATISTICS [LOW].
  • systables:
    • nrows: number of rows.
    • npused: number of pages used on disk.
    • ustlowts: when UPDATE STATISTICS was last run.
  • syscolumns: for indexed columns only:
    • colmin: the second lowest value.
    • colmax: the second highest value.
  • sysindices:
    • levels: number of B-Tree levels.
    • leaves: number of leaves.
    • nunique: number of unique values in the first column.
    • clust – incremented when values in the index are different to the last: max value is the number of rows; a low number indicates lots of duplicates.
    • nrows: number of rows.
    • ustlowts: when UPDATE STATISTICS was last run.
    • ustbuildduration: time to build index statistics.
  • Updated by UPDATE STATISTICS MEDIUM or HIGH.
  • Consist of histograms for values or value ranges in equally sized buckets in sysdistrib.
  • Fragment-level statistics are stored in sysfragdist.

Auto Update Statistics (AUS) basics

Auto Update Statistics (AUS) consists of two database scheduler jobs. These are stored in the sysadmin database in table ph_task with configuration settings in ph_threshold.

Auto Update Statistics Evaluation
This calls a set of stored procedures which populate the sysadmin:aus_command table with a prioritized list of UPDATE STATISTICS commands to run. The code for these procedures is in $INFORMIXDIR/etc/sysadmin/sch_aus.sql
Auto Update Statistics Refresh
This is a UDR that does the work of calling the UPDATE STATISTICS commands. In older versions this was done with SPL code. Auto Update Statistics Refresh cannot be called even manually without the database scheduler running.

If your instance has a stop file ($INFORMXDIR/etc/sysadmin/stop) to prevent the scheduler initialising with the instance you must remove it. Before you do so it’s a good idea to review which jobs are enabled (tk_enable='t') in the ph_task table. (Using a stop file is a bad idea with 12.10 because running without the scheduler stops some system processes from functioning, so if you’re doing this, you ought to change it even if you don’t want to use AUS.)

One advantage of AUS is that it works on all of your databases, including sysmaster.

Instance parameters affecting UPDATE STATISTICS

Three onconfig parameters affect statistics maintenance, independently of the tool you use to update statistics:

Parameter Description
AUTO_STAT_MODE Controls whether to only update statistics or distributions the engine considers stale when running UPDATE STATISTICS.
STATCHANGE Controls the percentage change beyond which statistics are considered stale if AUTO_STAT_MODE is enabled. This can be overridden at table level.
USTLOW_SAMPLE Whether to use sampling for UPDATE STATISTICS LOW.

AUS specific parameters

AUS’s configuration settings stored in ph_threshold work independently of these system level settings.

These can be updated via SQL without using OpenAdmin Tool:

UPDATE
  sysadmin:ph_threshold
SET
  value='0'
WHERE
  name='AUS_CHANGE';

Parameter Description
AUS_AGE The number of days after gathering statistics that they are considered stale.
AUS_CHANGE Prioritises tables based on the percentage of rows that have changed. This is not to be confused with STATCHANGE.
AUS_PDQ The PDQPRIORITY to use while updating statistics. If you are using workgroup edition set this to 0 to avoid annoying messages in the online log about not being allowed to use PDQ.
AUS_SMALL_TABLES The small table row threshold. These tables are considered volatile and updated on every run.
AUS_RULES
0
When set to zero, AUS updates any existing distributions only and won’t ensure that a minimum set of distributions exists according to rules. This allows you more fine-grained control of what you maintain. Be aware that creating an index automatically builds HIGH mode distributions on leading columns with data so if you don’t want these you would need to drop them.
1
Where no existing distributions exist, AUS will create:

  • HIGH mode distributions on all leading index columns.
  • MEDIUM mode distributions on other indexed columns.

For most systems ‘1’ is the value you should use. If in doubt, use this.

You’ll notice in the description for AUS_RULES=1 that AUS does not automatically generate any distributions on non-indexed columns. However it will maintain any existing distributions regardless of the value of AUTO_AUTO_RULES, even if AUS wouldn’t have created them if they didn’t exist.

Non-indexed columns and distributions

If you are migrating from dostats you will have MEDIUM mode distributions on all non-indexed columns. It’s possible to drop these on individual columns using:

UPDATE STATISTICS LOW FOR TABLE <tab name> (<colname>) DROP DISTRIBUTIONS ONLY;

I’d test before doing anything like this so you may just choose to leave them in and let AUS maintain them. If UPDATE STATISTICS MEDIUM doesn’t take very long on your system this is probably the best choice.

Are these distributions important? My answer is that it depends but, more often than not, no. The Informix Performance Guide recommends you have them but this is just a general recommendation. At this point it’s important not to lose sight of the fact that the goal is not to have the most comprehensive, high-resolution and up to date statistics possible; it is to ensure you can guarantee that your queries always run efficiently. Given that we can’t be updating statistics and distributions on all columns all of the time some compromises are inevitable.

Often when running a query the main risk is the optimizer choosing to use sequential scans instead of an index. This risk is greatly reduced, if not eliminated, if the onconfig parameter OPTCOMPIND is set 0. The downside of this is that the optimizer won’t select a sequential scan when it is the best query plan available unless there are no other options.

In general distributions on a column are more useful if there is some skew in the data. However be aware that for non-indexed columns syscolumns.colmin and syscolumns.colmax are never populated by UPDATE STATISTICS LOW so the optimizer is truly blind about the data ranges without a distribution.

I’m going to run through an example now using this table:

people table indices
person_id serial not null ix_people1 (person_id)
name varchar(254) not null ix_people2 (age)
gender char(1) not null
age int not null

The table will be populated with random ersatz data as follows:

  • 1 million rows.
  • Ages evenly distributed between 5 and 84.
  • 75% female, 25% male.

I’ll be running this query:

SELECT
  name
FROM
  people
WHERE
  age BETWEEN 18 AND ? AND
  gender=?

And tweaking these parameters:

  • Upper age limit in the query.
  • Gender in the query.
  • Whether I have a medium mode distribution on gender.
  • The value of OPTCOMPIND in the onconfig.

My results were as follows:

Upper age Gender Actual rows returned Medium mode distribution on gender? OPTCOMPIND Estimated rows Estimated cost Plan
25 f 74817 No 0 9988 21104 Index path
2 21104 Index path
Yes 0 75539 21104 Index path
2 21104 Index path
25 m 25061 No 0 9988 21104 Index path
2 21104 Index path
Yes 0 24539 21104 Index path
2 21104 Index path
30 f 121748 No 0 16232 38477 Index path
2 33923 Sequential scan
Yes 0 122439 38477 Index path
2 33923 Sequential scan
30 m 40572 No 0 16232 38477 Index path
2 33923 Sequential scan
Yes 0 39881 38477 Index path
2 33923 Sequential scan

What conclusions can we draw from this example?

  • OPTCOMPIND was a determining factor, not the presence of a medium mode distribution on gender.
  • Having the distribution gave a much better estimate of the number of rows returned.
  • The optimizer never used a different plan for the male or female queries.

Of course this is one example and you may have some different ones.

Columns with incrementing values

Let’s illustrate a different point with another example.

Maintaining distributions on any sort of log or journal where there is a timestamp field can be a problem. The highest value in your distribution is wrong almost immediately after calculating it because new rows are being added all the time with later timestamps. This means that if you do a query over recently added data your distributions may tell the optimizer that there’s no data.

ledger table indices
line_id serial not null ix_ledger1 (line_id)
account_id int not null ix_ledger2 (account_id, creation_time)
creation_time datetime year to second not null ix_ledger3 (creation_time)
ix_ledger4 (account_id)
for_processing table indices
line_id int not null ix_fp1 (line_id)
creation_time datetime year to second not null ix_fp2 (creation_time)

Both of these tables have over 1 million rows and new rows being added continuously.

I am going to run this query:

SELECT

  FIRST 5

  l.line_id,
  l.creation_time

FROM

  ledger l,

  for_processing f

WHERE

  f.line_id = l.line_id AND

  l.account_id = 50 AND

  f.creation_time BETWEEN

'2015-02-02 17:00:00' and '2015-02-02 21:00:00';

There are two conceivable ways to run this query:

  1. Use the index on creation_time on for_processing, join to the ledger table on line_id and then filter on the account_id column.
  2. Use the index on account_id on the ledger table, join by line_id and then filter on the creation_time column.

The risk with the first one is that a lot of rows are read, only to eliminate the vast majority of them when the account_id criterion is applied.

The optimizer may prefer to drive off creation_date, particularly if the distribution indicates there are no data past the time the distribution was gathered. This is because it believes (wrongly) that selecting from a date range in the for_processing table where it believes there is no data at all, avoiding the need to do any work, is more efficient than selecting out an account from the ledger and then joining the for_processing table.

This can be very slow for large date ranges. This is particularly true when there are a large number of accounts.

Can AUS help you here? Not really, this example is more to point out a danger. The risk of running into a problem like this can be massively increased if you use the default STATCHANGE value of 10. This is because here it is the age of the distributions that matters, not how much the data has changed.

My recommendation is:

In your onconfig either:

set AUTO_STAT_MODE to 0.

Or set STATCHANGE to 0 if AUTO_STAT_MODE is 1.

If there are tables for which this is not appropriate, do so at the table level:

ALTER TABLE <tabname> STATCHANGE <percent>;

In my view restricting updates only when there have been absolutely no changes is the only safe way.

There is an undocumented feature that can help here:

IC91678: Informix optimizer vulnerable to poor query response on incrementing columns.

However the fix is not switched on unless you set onconfig parameter: SQL_DEF_CTRL 0x2. This can be switched on dynamically with onmode -wm. With this switched on, date/time distributions are effectively extended into the future by up to 30 days. While the answer to the question how old can my statistics and distributions be is nearly always it depends, with this switched on there is a hard limit.

In Informix 12.10.FC5 the fix is included and is now the default behaviour.

The date or datetime column concerned must have a default value of TODAY or CURRENT. The code also compares the last timestamp in the distribution with the time the distribution was gathered. The two must be close together to activate the feature.

This fix also works on serial, serial8 and bigserial fields.

This feature is a little tricky to test because you must:

  • Populate the tables with a large volume of data.
  • Update the distributions.
  • Add some more data with later timestamps.
  • Wait a while!

Here are my results:

Date range
Within the distribution bounds Beyond the distribution upper bound Beyond the distribution upper bound with IC91678
Query drives off account_id creation_time account_id
Costs: lead with account_id 264 398 398
Costs: lead with creation_time 79004 4 398
Estimated rows 20 1 1
Actual rows 10 1 1

Wildcard queries and insufficient distribution resolution

Another problem you may have with the standard distributions created by UPDATE STATISTICS HIGH is insufficient resolution. By default the data are divided into 200 equal-sized buckets (resolution 0.5) and this may not suffice for some wildcard queries. The optimizer may be grossly wrong in its estimates for the number of rows returned and this can be improved by increasing the number of buckets.

customer table indices
customer_id serial not null ix_customer1 (customer_id)
user_name varchar(50) not null ix_customer2 (user_name)

Again this table will be populated with random ersatz data as follows:

  • 9 million rows.
  • This gives a distribution bucket size of 45000 with 0.5 resolution.

And my query:

SELECT
  FIRST 1000
  customer_id,
  user_name
FROM
  customer
WHERE
user_name
  LIKE 'BAN%'
ORDER BY
  customer_id;

Look at the query carefully. It’s the sort of query you might get from a web customer search form. The ORDER BY on customer_id is important because it gives the engine the option of avoiding sorting any data if this index is used to select the data. If the optimizer thinks the user_name criterion is not very selective, i.e. there are a lot more than 1000 customers whose user name begins with the letters BAN and it will find them quickly without reading many rows, it may prefer this query plan.

There are two credible plans for this query:

  1. Use the index on user_name, read all the rows it points to and then sort on customer_id, return the first 1000.
  2. Use the unique index on customer_id, read the rows it points to, filter, stop at 1000 rows. This plan does not require a sort.

There is a third option of scanning the entire table but it’s unlikely the optimizer will choose this.

Let’s look at the results of running this query with a standard HIGH mode distribution on customer_name, a second distribution with ten times the resolution and no distribution at all.

Distribution
Variable HIGH 0.5 resolution HIGH 0.05 resolution No distributions
Selected index (no directive) customer_id customer_name customer_id
Costs: lead with customer_id 3480 313249 174
Costs: lead with user_name 567525 1214 12435040
Estimated rows 90000 1000 1800000
Actual rows 1028 1028 1028

This example is real but also carefully selected because different three letter combinations may give different and more sane results, even with the standard distribution resolution.

If instead I run this query:

SELECT
  FIRST 1000
  customer_id,
  user_name
FROM
  customer
WHERE
user_name
  LIKE 'TDM%'
ORDER BY
  customer_id;

I get:

Distribution
Variable HIGH 0.5 resolution HIGH 0.05 resolution No distributions
Selected index (no directive) customer_name customer_name customer_id
Costs: lead with customer_id 135723 313249 174
Costs: lead with user_name 2909 1214 12435040
Estimated rows 2308 1000 1800000
Actual rows 801 801 801

AUS makes maintaining custom resolutions (or confidence levels with UPDATE STATISTICS MEDIUM) very easy. Simply run UPDATE STATISTICS manually to gather the desired distribution. AUS will maintain this resolution for you.

How frequently should I run UPDATE STATISTICS?

A common question a DBA may ask is how often is it necessary to UPDATE STATISTICS? A straightforward if unhelpful answer is often enough to ensure efficient query plans. More specifically:

  • On tables without columns with incrementing values, by which I mean that new values lie inside the range of the existing data, or where these columns are not used in the selection criteria for queries, it may be safe to use STATCHANGE (with AUTO_STAT_MODE) to regulate the frequency based on percentage change at the table level.
  • With incrementing values the working set can quickly get beyond the min/max values in the statistics or distributions. Here I’d recommend being much more aggressive and update based on age, regardless of the volume of change. This especially applies to your distributions.

If your system is small enough that you can run a full UPDATE STATISTICS every day there is no harm in doing this. It is probably overkill but it is one way of playing safe. To do this set AUS_CHANGE to 0 and make sure both the scheduler and the evaluator run daily. For larger systems you do need to be more selective about how often you run UPDATE STATISTICS.

Monitoring AUS

Does AUS just work? Well yes, it was aimed at the embedded market and it is pretty robust. On a larger system there is more to go wrong and so I’d recommend you check that:

  • the parameters are set correctly (in case they change if you rebuild the sysadmin database, for example).
  • there are no errors running UPDATE STATISTICS.
  • all statistics and distributions are sufficiently up to date.
  • the window you give the refresh job to run in is long enough.
Check Notes
Correct parameters
  • Job scheduler running and dbScheduler thread exists.
  • Evaluator and refresh tasks are enabled and have the expected schedules.
  • AUS_AGE, AUS_CHANGE, AUS_AUTO_RULES, AUS_PDQ and AUS_SMALL_TABLES set correctly.
No errors
  • No errors reported in the aus_command table (aus_cmd_state='E').
Statistics and distributions sufficiently up to date
  • Query system tables to get the age of statistics and distributions.
  • Fairly complex check, particularly if using STATCHANGE>0.
  • Then need to consider ninserts, nupdates and ndeletes columns.
Statistics:
Find tables where systables.ustlowts older than your threshold and then check whether it is in the result set of this SQL:

SELECT {+ORDERED}
  st.tabname
FROM
  sysdistrib sd,
  systables st,
  sysmaster:sysptnhdr sp
WHERE
  sd.tabid = st.tabid AND
  st.partnum = sp.partnum AND
  sd.tabid > 99 AND
  st.ustlowts > <days old> AND
  (sp.ninserts - sd.ninserts) + (sp.nupdates - sd.nupdates) + (sp.ndeletes - sd.ndeletes) > 0 AND
  (
  st.statchange IS NOT NULL AND (
  st.statchange = 0 OR
  ((sp.ninserts - sd.ninserts) + (sp.nupdates - sd.nupdates) + (sp.ndeletes - sd.ndeletes))/st.nrows*100 > st.statchange
  ) OR
  (
  st.statchange IS NULL AND
  (
  <onconfig STATCHANGE parameter> = 0 OR
  ((sp.ninserts - sd.ninserts) + (sp.nupdates - sd.nupdates) + (sp.ndeletes - sd.ndeletes))/st.nrows*100 > <onconfig STATCHANGE parameter>)
  )
  )
GROUP BY
  st.tabname
UNION
SELECT {+ORDERED}
  st.tabname
FROM
  sysdistrib sd,
  systables st,
  sysfragments sf,
  sysmaster:sysptnhdr sp
WHERE
  sd.tabid = st.tabid AND
  st.tabid = sf.tabid AND
  sf.partn = sp.partnum AND
  sd.tabid > 99 AND
  st.statchange is not null AND
  sf.fragtype='T' AND
  st.ustlowts > <days old>
GROUP BY
  st.tabname
  HAVING
  (SUM(sp.ninserts) - AVG(sd.ninserts)) + (SUM(sp.nupdates) - AVG(sd.nupdates)) + (SUM(sp.ndeletes) - AVG(sd.ndeletes)) > 0 AND
  (
  MIN(st.statchange) = 0 OR
  (SUM(sp.ninserts) - AVG(sd.ninserts)) + (SUM(sp.nupdates) - AVG(sd.nupdates)) + (SUM(sp.ndeletes) - AVG(sd.ndeletes))/AVG(st.nrows)*100 > MIN(st.statchange)
  )

Distributions:
SELECT DISTINCT
  tabname,
  colname,
  TODAY - sd.constr_time
  st.nrows
FROM
  sysdistrib sd JOIN
  systables st ON (sd.tabid = st.tabid) JOIN
  syscolumns sc ON (sd.tabid = sc.tabid AND sd.colno = sc.colno) JOIN
  sysmaster:sysptnhdr sph ON (sph.partnum = st.partnum)
WHERE
  sd.tabid > 99 AND
  (sph.ninserts - sd.ninserts) + (sph.nupdates - sd.nupdates) + (sph.ndeletes - sd.ndeletes) > 0 AND
  (
  st.statchange IS NOT NULL AND
  (
  st.statchange = 0 OR
  ((sph.ninserts - sd.ninserts) + (sph.nupdates - sd.nupdates) + (sph.ndeletes - sd.ndeletes))/st.nrows*100 > st.statchange
  ) OR
  (
  st.statchange IS NULL AND
  (
  <onconfig STATCHANGE parameter> = 0 OR
  ((sph.ninserts - sd.ninserts) + (sph.nupdates - sd.nupdates) + (sph.ndeletes - sd.ndeletes))/st.nrows*100 > <onconfig STATCHANGE parameter>)
  )
  ) AND
  (TODAY - DATE(sd.constr_time)) >= <days old>
ORDER BY 4, 3 DESC
Long enough window in which to run commands
  • No pending commands at the end of the refresh task window (aus_command.aus_cmd_state='P');

UPDATE STATISTICS FOR PROCEDURE

How does AUS deal with stored procedure plans as stored in sysprocplan? Well it doesn’t directly and does not call UPDATE STATISTICS FOR PROCEDURE [procname].

My take on this is that routines referencing updated tables will be recompiled anyway the first time they are called immediately after running UPDATE STATISTICS. On a busy OLTP system this will probably happen before you have chance to update procedure plans manually. If you do have reason to do this, you will need to do it manually and if you do don’t set PDQPRIORITY.

If your system does have dead time there may be a small advantage to running this but I don’t think it really matters that much.

Method for running UPDATE STATISTICS [LOW]

Let’s now look at how AUS actually calls UPDATE STATISTICS. As discussed earlier the evaluation task creates a list of commands to run and these are run by the refresh task exactly as you see them when you query the aus_command table.

Let’s start by considering UPDATE STATISTCS [LOW].

AUS simply calls UPDATE STATISTICS [LOW] as one command without specifying any column names and I have seen it suggested that this is a bad thing (I don’t agree). The popular alternative, dostats, runs UPDATE STATISTICS LOW separately for each index key. Performance-wise there is not as much difference as you might expect, I suspect this is because data tend to be in the buffer cache on repeated calls. But is there any difference to the end result?

Using Informix 11.70.FC7W1 I performed the following test:

  • Created two identical empty tables with twelve indices and loaded both with the same data.
  • Ran AUS on one, dostats on the other.
  • Unloaded and compared systables, syscolumns and sysindices for both tables.

Apart from tabid, tabname, index names and build times, the unloads were identical. My conclusion is that is no difference and AUS is slightly faster because it is done with one command in a single pass.

Performance of UPDATE STATISTICS [LOW]

In terms of performance there is actually very little you can do to influence the speed of UPDATE STATISTICS LOW. The parameters DBUPSPACE, PSORT_NPROCS and even PDQPRIORITY have no effect.

Setting USTLOW_SAMPLE in the onconfig file is the only performance optimisation available apart from general server tuning.

It is supposed to be possible to get parallel scan threads for all fragmented indices when PDQPRIORITY is set to at least 10 but I can’t reproduce this. This only works with USTLOW_SAMPLE is switched off.

Performance of UPDATE STATISTICS MEDIUM or HIGH

For UPDATE STATISTICS HIGH and MEDIUM there are a few parameters that can influence performance and with the exception of PDQPRIORITY you can’t use any of them with AUS. In summary they are:

  • DBUPSPACE: three parameters in one controlling four things:
    • how much memory and disk are available for sorting when PDQPRIORITY is not set.
    • whether to use indices for sorting.
    • whether to print the explain plan.
  • DBSPACETEMP: overrides default onconfig value.
  • PSORT_DBTEMP: allows you to specify a filesystem instead of DBSPACETEMP.
  • PSORT_NPROCS: specifies number of threads for sorting.

Having done tests with this parameter:

  • The disk and memory parameters don’t have any effect if you set PDQPRIORITY. If you do up to DS_TOTAL_MEMORY multiplied by the effective PDQPRIORITY (as a percentage) can be allocated.
  • Even if set, they only have the effect of restricting the number of columns that can be processed in a single pass so a lot of the time they make no difference. For a better understanding of how this works see John Miller III’s article on Understanding and Tuning Update Statistics.
  • Setting the explain output via the directive setting works a bit weirdly: you need to run set explain on separately to enable the explain plan even if set here but disabling it with this parameter does work.
  • The default is to use indices for sorting. Switching this off (directive 1 or 2) was up to 5x slower on my test system.

Here are some performance tests showing the effects of different DBUPSPACE settings on UPDATE STATISTICS HIGH. My performance tests are all carried out on a powerful 32 core Linux x86_64 server with 256 Gb RAM and fast SAN storage (not my laptop). For my tests I did a dummy run first to try and even out any effects of caching in the SAN. I then did a run immediately after initialising my instance (cold) and a repeat run (warm) for each result.

10 million row non-partitioned table with 12 single-column indices:

DBUPSPACE PDQPRIORITY Memory allocated (Mb) No. of scans Indices used for sorting? Light scans used? Time elapsed cold (s) Time elapsed warm (s)
1024:15:0 (default) 0 15 12 Yes No 111 97
0:50:0 0 50 12 Yes No 108 97
0:50:1 0 50 12 No No 259 254
0:50:0 100 2462.7 1 Yes Yes 172 180
0:50:1 100 2462.7 1 No Yes 176 172

74 million row table with 8-way round-robin partitioning and 3 single-column indices:

DBUPSPACE PDQPRIORITY Memory allocated (Mb) No. of scans Indices used for sorting? Light scans used? Time elapsed cold (s) Time elapsed warm (s)
1024:15:0 (default) 0 15 3 Yes No 227 169
0:50:0 0 50 3 Yes No 224 168
0:50:1 0 50 3 No No 501 494
0:50:0 100 3064.7 1 Yes Yes 432 426
0:50:1 100 3064.7 1 No Yes 425 428

What conclusions can we draw from these results?

  • Using PDQPRIORITY actually makes it slower.
  • There is no significant difference between using 15 Mb of memory for sorts and 50 Mb of memory for sorts. I suspect this is because the number of scans is the same.
  • Using indices for sorting (the default) is faster than not using indices for sorting when not using PDQ.
  • The use of light scans (avoiding the buffer cache) reduces the variation between the cold and warm results.

Despite having a partitioned table and PDQPRIORITY set, the interesting thing here is that during the execution I observed no parallelism. You can see this for yourself by identifying the session running UPDATE STATISTICS and looking at the threads the session is using. I get something like this for my partitioned table:

IBM Informix Dynamic Server Version 11.70.FC7W1 -- On-Line -- Up 00:01:11 -- 228250944 Kbytes

session           effective                            #RSAM    total      used       dynamic 
id       user     user      tty      pid      hostname threads  memory     memory     explain 
14       informix -         -        28798    guebobdb 1        11563008   10603832   off 

Program :
/opt/informix-11.70.FC7W1/bin/dbaccess

tid      name     rstcb            flags    curstk   status
201      sqlexec  26e1fbbfc8       ---PR--  24752    running-

Memory pools    count 4
name         class addr              totalsize  freesize   #allocfrag #freefrag 
14           V     26e4c59040       139264     10008      176        12        
14_SORT_0    V     26e573d040       4108288    294896     34865      4         
14_SORT_1    V     26e5748040       3657728    326176     32934      4         
14_SORT_2    V     26e5730040       3657728    325936     32937      4         

name           free       used           name           free       used      
overhead       0          13152          scb            0          144       
opentable      0          10640          filetable      0          3016      
ru             0          600            misc           0          160       
log            0          16536          temprec        0          21664     
blob           0          832            keys           0          664       
ralloc         0          19320          gentcb         0          1592      
ostcb          0          2944           sort           0          1577072   
sqscb          0          20224          sql            0          72        
srtmembuf      0          817320         hashfiletab    0          552       
osenv          0          3000           buft_buffer    0          8736      
sqtcb          0          9680           fragman        0          3640      
shmblklist     0          8074432        

sqscb info
scb              sqscb            optofc   pdqpriority optcompind  directives
e28f91c0         e4c5a028         0        100         0           1         

Sess       SQL            Current            Iso Lock       SQL  ISAM F.E. 
Id         Stmt type      Database           Lvl Mode       ERR  ERR  Vers  Explain    
14         UPDATE STATIST testdb             DR  Wait 15    0    0    9.24  On         

Current SQL statement (6) :
  update statistics high for table large_8-way_table
    (id_col1,id_col2,date_col) distributions only force

Last parsed SQL statement :
  update statistics high for table large_8-way_table
    (id_col1,id_col2,date_col) distributions only force

In the above example we can see that the engine has allocated three memory pools, one each for the three columns we are updating in a single pass with PDQPRIORITY set.

If you do see parallel sort (PSORT) threads like this:

Program :
/opt/informix-11.70.FC7W1/bin/dbaccess

tid      name     rstcb            flags    curstk   status
370      sqlexec  26e1fbaf28       ---PR--  20688    running-
371      psortpro 26e1fde468       Y------  912      cond wait  incore    -
372      psortpro 26e1fcda68       Y------  912      cond wait  incore    -
373      psortpro 26e1fd88f8       Y------  912      cond wait  incore    -
374      psortpro 26e1fd1498       Y------  912      cond wait  incore    -
375      psortpro 26e1fc6608       Y------  912      cond wait  incore    -
376      psortpro 26e1fdc328       Y------  912      cond wait  incore    -
377      psortpro 26e1fd4ec8       Y------  912      cond wait  incore    -
378      psortpro 26e1fca038       Y------  912      cond wait  incore    -
379      psortpro 26e1fbe958       Y------  912      cond wait  incore    -
380      psortpro 26e1fdbad8       Y------  912      cond wait  incore    -
381      psortpro 26e1fcb0d8       Y------  912      cond wait  incore    -
382      psortpro 26e1fda1e8       Y------  912      cond wait  incore    -
383      psortpro 26e1fc97e8       Y------  912      cond wait  incore    -
384      psortpro 26e1fd5718       Y------  912      cond wait  incore    -
385      psortpro 26e1fcd218       Y------  912      cond wait  incore    -
386      psortpro 26e1fc2388       Y------  912      cond wait  incore    -
387      psortpro 26e1fd80a8       Y------  912      cond wait  incore    -
388      psortpro 26e1fd0c48       Y------  912      cond wait  incore    -
389      psortpro 26e1fc5db8       -------  1632     running-
390      psortpro 26e1fdf508       Y------  912      cond wait  incore    -
391      psortpro 26e1fd7858       Y------  912      cond wait  incore    -
392      psortpro 26e1fc6e58       Y------  912      cond wait  incore    -
393      psortpro 26e1fd35d8       Y------  912      cond wait  incore    -
394      psortpro 26e1fdcb78       -------  8        running-
395      psortpro 26e1fca888       -------  1632     running-
396      psortpro 26e1fbf9f8       -------  1632     running-
397      psortpro 26e1fcc9c8       -------  8        running-
398      psortpro 26e1fc1b38       -------  8        running-
399      psortpro 26e1fd1ce8       -------  8        running-

Memory pools    count 4
name         class addr              totalsize  freesize   #allocfrag #freefrag 
41           V     270a6cd040       954368     61560      438        53        
41_SORT_0    V     2706fc2040       3915776    2184       33415      10        
41_SORT_1    V     2706fbb040       4222976    14488      31589      27        
41_SORT_2    V     2706fbc040       4222976    14248      31592      27

It is either because PSORT_NPROCS is set in your environment or set in the database engine’s environment when it was started.

Let’s now looking at the effect of PSORT_NPROCS. The only way to use this with AUS is to set PSORT_NPROCS when you start the database engine, which will of course affect all sessions.

Setting PSORT_NPROCS is the only way to get any parallel processing with UPDATE STATISTICS HIGH or MEDIUM. It has no effect on UPDATE STATISTICS LOW. Setting PDQPRIORITY only provides more memory and allows HIGH and MEDIUM mode distributions on multiple columns to be built in a single pass, if enough memory is available. There will be one SORT memory pool per column being processed regardless of PDQPRIORITY. Sorting with PSORT_NPROCS set can be faster as we’ll see now.

Carrying on with the same examplse as above, I get these results:

10 million row non-partitioned table with 12 single-column indices:

DBUPSPACE PDQPRIORITY PSORT_ NPROCS Memory allocated (Mb) No. of scans Light scans used? Time elapsed cold (s) Time elapsed warm (s)
0:50:0 0 Not set 50 12 No 108 97
0:50:0 0 24 50 12 No 123 97
0:50:0 100 Not set 2462.7 1 Yes 172 180
0:50:0 100 24 2538.4 1 Yes 80 83

74 million row table with 8-way round-robin partitioning and 3 single-column indices:

DBUPSPACE PDQPRIORITY PSORT_ NPROCS Memory allocated (Mb) No. of scans Light scans used? Time elapsed cold (s) Time elapsed warm (s)
0:50:0 0 Not set 50 3 No 224 168
0:50:0 0 24 50 3 No 223 170
0:50:0 100 Not set 3064.7 1 Yes 432 426
0:50:0 100 24 3083.7 1 Yes 156 145

What conclusions can we draw this time?

  • The fastest UPDATE STATISTICS HIGH (or MEDIUM) performance is with PDQ and PSORT_NPROCS set. (Seasoned DBAs might have expected this result).
  • But it’s not much faster than running without either of these parameters set, probably with fewer server resources.
  • It’s worth bearing in mind that PDQ enables light scans which may avoid buffer cache churn.

Adding additional AUS Refresh tasks for greater throughput

There is another way to achieve parallelism with AUS and that is to add additional scheduler tasks so that more than one table can be worked on at once, for example:

INSERT INTO
  sysadmin:ph_task (
    tk_name,
    tk_description,
    tk_type,
    tk_dbs,
    tk_execute,
    tk_delete,
    tk_start_time,
    tk_stop_time,
    tk_frequency,
    tk_monday,
    tk_tuesday,
    tk_wednesday,
    tk_thursday,
    tk_friday,
    tk_saturday,
    tk_sunday,
    tk_group,
    tk_enable
  )
  VALUES (
    'Auto Update Statistics Refresh #2',
    'Refreshes the statistics and distributions which were recommended by the evaluator.',
    'TASK',
    'sysadmin',
    'aus_refresh_stats',
    '0 01:00:00',
    '03:11:00',
    '12:45:00',
    '1 00:00:00',
    't',
    't',
    't',
    't',
    't',
    'f',
    'f',
    'PERFORMANCE',
    't'
  );

It is vital that the task name begins with Auto Update Statistics Refresh as shown here otherwise some of the internal code that stops the evaluator from running at the same time as the refresh tasks will not work.

Think of it as a batch process where the goal is to update all your statistics and distributions, not doing an individual table as fast as possible.

I recommend this method if you need extra grunt! Then run without PDQ (AUS_PDQ = 0).

Pros and cons of using PSORT_DBTEMP

Let’s move to consider another environment variable you can’t easily use: PSORT_DBTEMP. Is using a filesystem for sorts faster than temporary dbspaces? For this I am not going to do any performance tests largely because a comparison between the local SATA disks used for filesystems and the battery-backed, part solid state SAN used for the database storage on my server is not a fair fight.

If you want to use PSORT_DBTEMP with AUS, again you will need to set it in your environment when initialising the server and use it across your system.

The only definitive benefit of pointing to a filesystem using PSORT_DBTEMP instead of using DBSPACETEMP is that the Linux filesystem cache has an effect. This means that your temporary files may never be committed to disk, giving you a performance advantage. Another interesting alternative is to use a RAM disk.

Otherwise when considering whether to use a filesystem over temporary dbspaces, I would consider your hardware.

Recent UPDATE STATISTICS defects

Below are some defects relating to UPDATE STATISTICS that I have personally logged with IBM. All are general UPDATE STATISTICS issues and whether you use AUS or not does not affect your chances of hitting these. I have written some comments underneath.

IT06767 UPDATE STATISTICS HIGH ON TABLE BIGGER THAN 1.1 BIO ROWS CAN CAUSE EXCEPTION IN SRTQUICK
This defect can only be hit if you use PDQ and a large amount of memory is allocated.
IT06726 Assert Warning Invalid index statistics found when using statistics sampling with index having many deleted items
Requires USTLOW_SAMPLE to be switched on. It is fairly harmless but does write some garbage to sysfragments. Re-running UPDATE STATISTICS LOW usually fixes it.
IT02679 UPDATE STATISTICS HIGH ON A FRAGMENTED TABLE LEADS TO ERROR MESSAGE 9810
This can only be seen if using fragment level statistics.
IT05463 Fragment based update stats high consumes a huge amount of sblobspace
This can only be seen if using fragment level statistics. It is not worth trying to make your sbspace extremely large to work around this problem.
IT05639 -768: Internal error in routine mbmerge2bin when running update statistics on a fragmented table (running fragment-level statistics)

Another fragment level statistics one. With the fixes for IT02679, IT05463 and IT05639 this feature is solid. I wouldn’t enable this for a table unless you have all of these in your version.

Most of these defects are fixed in 12.10.FC5.

UPDATE STATISTICS performance summary

Overall then what seems to be the best balance between speed and use of resources on your system is:

  • In terms of AUS, leave AUS_PDQ at 0.
  • If you need more throughput add more Refresh tasks to the scheduler.
  • That’s it.
  • Set DBUPSPACE=0:50:0 in the server environment.

I hope this has been useful. I’ll end with a quick summary of some of the more interesting performance points from this presentation which apply regardless of whether you use AUS or not:

  • Setting PDQPRIORITY only provides more memory for UPDATE STATISTICS HIGH or MEDIUM and does not provide parallel processing. It may even make it run slower.
  • If you want multi-threaded processing, this only works by setting PSORT_NPROCS and then only with UPDATE STATISTICS HIGH or MEDIUM. Because it does not require PDQ you can use this with workgroup edition.
  • In my tests only when used together do PDQPRIORITY and PSORT_NPROCS improve performance. PDQ does switch on light scans though which avoid churning your buffer cache..
  • Not using indices for sorting for UPDATE STATISTICS HIGH or MEDIUM can be significantly slower.
  • The performance of UPDATE STATISTICS [LOW] can be improved by setting USTLOW_SAMPLE.

Temporary tables

Again, this is another blog post about nothing new at all but an attempt to put down my understanding of temporary tables down in a way that will help me when I refer back to it and hopefully be of more wider use.

When looking at things like this a test system is always essential for checking things and finding some surprising results. If you don’t have one you can soon set one up with VMWare or Virtual Box and a copy of Informix Developer Edition. For this post I am going to set up a simple test instance with four specific dbspaces:

  • A logged rootdbs with plenty of free space, called rootdbs.
  • A logged dbspace for my data and indices, called datadbs.
  • A temporary dbspace without logging, called tmp2k_unlogged.
  • A temporary dbspace with logging, called tmp2k_logged.

What’s the difference between a temporary dbspace with logging and a normal dbspace? Nothing except that one appears in the DBSPACETEMP onconfig setting.

Consider the SQL statements which explicitly create temporary tables in a session:

create temp table unlogged (col1 int) with no log;
insert into unlogged values (1);

and:

create temp table logged (col1 int);
insert into logged values (1);

For these tests we must have TEMPTAB_NOLOG set to 0 in our onconfig otherwise the second statement will silently have a with no log criterion added to it.

Let’s run these, use oncheck -pe to see which dbspace they get placed in and then use onlog -n <starting log unique identifier> to see if changes to these tables get logged or not:

DBSPACETEMP setting unlogged table logged table
dbspace used logged operations dbspace used logged operations
tmp2k_unlogged:tmp2k_logged tmp2k_unlogged no tmp2k_logged yes
tmp2k_unlogged tmp2k_unlogged no datadbs yes
tmp2k_logged tmp2k_logged no tmp2k_logged yes
NONE datadbs no datadbs yes

So already a few interesting results drop out:

  1. We can specify both logged and unlogged dbspaces using the DBSPACETEMP parameter.
  2. The engine will prefer logged dbspaces for logged tables and unlogged dbspaces for unlogged tables.
  3. If an unlogged dbspace is not available the engine can use a logged dbspace and create unlogged tables in it.
  4. If a logged dbspace is not available the engine will use an alternative dbspace because an unlogged dbspace does not logging at all. In this case it has chosen datadbs, because this is the dbspace in which I created my database.

At this point it’s worth referring to an IBM technote on this subject. This suggests some more tests but already my results are not in agreement with the first example given:

If we have created a dbspace named tmpdbs, but we could not see it was marked as ‘T’ in the result of onstat -d. We set DBSPACETEMP configuration parameter to tmpdbs. On this condition, tmpdbs will be used for logged temporary tables. That means if a temp table is created with ‘WITH NO LOG’ option, the server will not use it.

This is implying that my tmp2k_logged dbspace (it will not have the ‘T’ flag) cannot be used for unlogged temporary tables. You can see from my table that this isn’t true and I invite you to test this for yourself.

As part proof here is the onstat -d and oncheck -pe output:

$ onstat -d | grep tmp2k
7f9d7310         3        0x60001    3        1        2048     N  BA    informix tmp2k_logged
7f9d74b8         4        0x42001    4        1        2048     N TBA    informix tmp2k_unlogged


DBspace Usage Report: tmp2k_logged        Owner: informix  Created: 03/08/2015


 Chunk Pathname                             Pagesize(k)  Size(p)  Used(p)  Free(p)
     3 /informix_data/sandbox/tmp2k                   2      512       61      451

 Description                                                   Offset(p)  Size(p)
 ------------------------------------------------------------- -------- --------
 RESERVED PAGES                                                       0        2
 CHUNK FREELIST PAGE                                                  2        1
 tmp2k_logged:'informix'.TBLSpace                                     3       50
 db1:'thompsonb'.unlogged                                            53        8
 FREE                                                                61      451

 Total Used:       61
 Total Free:      451

Moving on, let’s do a slightly different test:

select * from sysmaster:systables into temp mytab_unlogged with no log;

and:

select * from sysmaster:systables into temp mytab_logged;

And the results:

DBSPACETEMP setting unlogged table logged table
dbspace used logged operations dbspace used logged operations
tmp2k_unlogged:tmp2k_logged tmp2k_unlogged no tmp2k_logged yes
tmp2k_unlogged tmp2k_unlogged no datadbs yes
tmp2k_logged tmp2k_logged no tmp2k_logged yes
NONE datadbs no datadbs yes

Again my results are in disagreement with the IBM technote which says:

If DBSPACETEMP is not specified, the temporary table is placed in either the root dbspace or the dbspace where the database was created. SELECT…INTO TEMP statements place the temporary table in the root dbspace. CREATE TEMP TABLE statements place the temporary table in the dbspace where the database was created.

In both cases in my tests the engine chose datadbs and not my root dbspace.

Let’s move on a bit. How do I know what temporary tables are in use on my system as a whole?

One way is to run something like:

onstat -g sql 0 | grep -A 3 'User-created Temp tables'

This might get you something like this:

User-created Temp tables :
  partnum  tabname            rowsize 
  400003   mysystables2       500
  400002   mysystables        500

Another way is to run oncheck -pe and have a look at what is in your temporary dbspaces. Here you may also see space used by the engine for sorting, marked with SORTTEMP, or temporary tables created implicitly by the engine for query processing. However whatever type of object it is, you will find it impossible to match anything you see to a particular session by this method; it is only possible to match to a user name which would only allow positive identification if there was only a single session per user.

There is another way to match tables to sessions which works for explicitly created temporary tables, for which I don’t want to claim any credit because I cribbed it from the IIUG Software Repository. The script is called find_tmp_tbls and it its present state is broken when used with 11.70 (and hasn’t been tested since version 9.30.FC2 according to its README): at least it does not work with 64-bit Linux, mainly because the syntax for onstat -g dmp seems to have changed slightly. I managed to fix it up, however.

It’s a little complicated to follow but the basic steps are this:

  1. You need to start with a given session and check if it has any temporary tables. (Unfortunately I don’t know a way of working backwards from the temporary table to see which session it belongs to either through onstat or the SMI interface.)
  2. Get the session’s rstcb value, either from onstat -g ses <sid> or from the first column in onstat -u.
  3. Run onstat -g dmp 0x<rstcb> rstcb | grep scb. Note that the rstcb value must be prefixed by 0x.This should return an scb value in hex.
  4. Take this value and run onstat -g dmp <scb> scb_t | grep sqscb. Your address must start with 0x again and this is true throughout all the examples. This will return two values; take the one labelled just sqscb.
  5. Feed this value into another dmp command: onstat -g dmp <sqscb> sqscb_t | grep dicttab. This will return another value.
  6. Finally take this and get the partnum(s) of the temporary tables for the session by running: onstat -g dmp <dicttab> "ddtab_t,LL(ddt_next)" | grep partnum.

Here is all that as an example:

$ onstat -g dmp 0x2aaaab608488 rstcb_t | grep scb
    scb          = 0x2aaaabf5f1c0
$ onstat -g dmp 0x2aaaabf5f1c0 scb_t | grep sqscb
    sqscb        = 0x2aaaad6c8028
    sqscb_poolp  = 0x2aaaad6c92c8
$ onstat -g dmp 0x2aaaad6c8028 sqscb_t | grep dicttab
    dicttab      = 0x2aaaad87c4f0
$ onstat -g dmp 0x2aaaad87c4f0 "ddtab_t,LL(ddt_next)" | grep partnum
    ddt_partnum  = 4194307
    ddt_partnum  = 4194306

It’s worth emphasising that this method will only work for explicitly created temporary tables. It won’t identify temporary space used by:

  • implicitly created temporary tables created by the engine to process a query.
  • temporary sort segments.

If there is a similar method for these types, I would be interested to find out about it.

Armed with the partnum you can do whatever you want with it like run this query against the sysmaster database to see what space is being used:

SELECT
  tab.owner,
  tab.tabname,
  dbsp.name dbspace,
  te_chunk chunk_no,
  te_offset offset,
  te_size size
FROM
  systabnames tab,
  systabextents ext,
  syschunks ch,
  sysdbspaces dbsp
WHERE
  tab.partnum in (4194306, 4194307) AND
  ext.te_partnum=tab.partnum AND
  ch.chknum=ext.te_chunk AND
  dbsp.dbsnum=ch.dbsnum
ORDER BY
  tab.owner,
  tab.tabname,
  te_extnum;

Giving results like:

owner tabname dbspace chunk_no offset size
thompsonb mysystables tmp2k_unlogged 6 53 8
thompsonb mysystables2 tmp2k_unlogged 6 61 8

For reference there is information about explicit and implicit temporary tables and temporary sort space in these tables in the sysmaster database:

  • sysptnhdr
  • sysptnext
  • sysptnbit
  • sysptntab
  • sysptprof
  • systabextents
  • systabinfo
  • systabnames
  • systabpagtypes

So in conclusion I hope this post brings together some useful information about explicit temporary tables. Personally I’d like to be able to get a complete picture of which sessions what statements are using temporary space, which this doesn’t give. If I find anything it will be subject of a future blog post.


Working with auto update stats

This article is superseded by my more comprehensive post, Experience with Auto Update Statistics (AUS).

This article has been written based on experience with version 11.70.FC5W1 and assumes some knowledge of stats and distributions and how they affect query optimisation. I appreciate any feedback in the comments section on my WordPress blog.

Auto update stats was introduced in 11.50.xC1 and, while being partly aimed at the embedded market, meant for the first time there was a complete solution to gathering database statistics bundled inside the product.

Several other tools exist to help with gathering statistics, for example AGS’s Server Studio can produce a set of UPDATE STATISTICS commands in a script to run against your database. Anecdotally, most DBAs use Art Kagel’s dostats utility, packaged up in the utils2_ak package available from the IIUG software repository. Dostats is pretty damn good although it’s not a complete solution as some scripts are needed to control it. It comes with a partner utility, drive_dostats, to do this but many DBAs, including myself, have written their own. Because dostats is the de-facto standard, I’ll refer to it a fair bit in this article. Also version 11.70 has a number of enhancements that don’t require you to use auto update stats; I’ll cover these as well.

So if you’re happily using dostats or another method to manage statistics, should you consider changing to auto update stats? Should it be your method of choice for a new-build instance? Well maybe: this article will go through some of the advantages and things to be aware of.

Here are some of its advantages:

  • The whole solution is part of the database engine and supported by IBM support.
  • It provides a complete framework, working within defined maintenance windows and is highly configurable.
  • It can be managed through OAT, although this is not needed.
  • Auto update stats does less work and takes less time than many solutions because it does not (by default) gather distributions on non-indexed columns or do separate low stats for each entire index key.
  • It’s fully integrated with the enhancements to update stats introduced in version 11.70.
  • It works on all your databases, including the system ones.
  • Perhaps my favourite feature: if you make manual adjustments, like increasing the resolution of the distributions on a column, auto update stats notices this and maintains the distributions at the new resolution. Similarly, if you manually create a distribution on a column it will maintain this.

One reason some DBAs don’t use auto update stats is because it involves using the job scheduler, which I’m told had issues in early releases with high CPU usage. For this reason, many DBAs touch a file called $INFORMIXDIR/etc/sysadmin/stop to stop it starting when the engine comes online. With 11.70.FC5W1 we run the job scheduler without any issues. (As an aside, if you’re not using it, it’s worth looking at the jobs to see what you’re missing: post_alarm_message is particularly useful.)

So can you just enable the job scheduler and let auto update stats do its thing? Not really. The first thing to look at is these onconfig parameters, which are in 11.70 and take effect regardless of the statistics method used:

  • AUTO_STAT_MODE
  • STATCHANGE
  • USTLOW_SAMPLE

Using auto stat mode and a non-zero value for STATCHANGE is something you need to consider very carefully. Internally the engine keeps a count of the number of inserts, updates and deletes that occur on each table, something that Keshava Murthy covered in his blog. If these collectively do not exceed STATCHANGE percent of the row count, statistics or distributions are not updated. This applies even when you run an UPDATE STATISTICS command manually. Confusingly the command still returns ‘Statistics updated’ even when nothing is done; the only clue is that the prompt returns instantly. To get around this there is a new FORCE keyword for the UPDATE STATISTICS statement that reverts to the old behaviour.

I find turning AUTO_STAT_MODE on and setting STATCHANGE to zero works quite well: this just skips tables where no updates, inserts or deletes have occurred.

You can set the value of STATCHANGE manually on individual tables with a fast-alter operation:

alter table table statchange change_threshold;

As it’s an update to systables, be aware an exclusive table lock is briefly needed.

I’m not keen on setting STATCHANGE to non-zero value because we have a lot of tables with incrementing date/time fields, meaning query optimisation is often time-based. I would find the option to override the age-based AUS_AGE parameter on a per-table basis much more useful, something that can only be done by writing your own script. Fortunately, as auto update stats evaluates the tables with stale stats on a regular scheduled basis, any ad-hoc updates are taken into account in its scheduling.

Setting USTLOW_SAMPLE enables sampling for UPDATE STATISTICS LOW statements, which is generally a good thing and can dramatically the time these statements take. It can be overridden in your user environment. Sampling generally works well as long as the table is not heavily skewed in some way: if Informix thinks it is you’ll see messages like this in the online log:

Warning: update statistics low using sampling may generate inaccurate index statistics for index owner.index_name due to data skew

Whether this is an issue for you will depend very much on your queries.

The other major enhancement in 11.70 is fragment-level statistics but I shan’t cover in detail here. If your storage schema is compatible with it and your table access patterns mean that some table fragments are never updated, it’s extremely useful. Informix’s implementation is nice in that the table stats are still considered as a whole when the optimiser evaluates queries, so you don’t get into trouble with having no stats for new fragments.

Perhaps the most significant difference between dostats and auto update stats is that dostats gathers additional distributions on non-indexed columns using UPDATE STATISTICS MEDIUM. If you have such distributions already auto update stats will continue to maintain them but it won’t create any new ones. All automated tools are attempting to apply a set of general criteria and recommendations to all tables so there is no hard and fast rule about whether you need them. One case to consider is where you have some sort of status flag such as a boolean or a column with a limited set of allowed values, perhaps enforced by a check constraint. Here distributions could be useful where these columns are used as filter conditions in queries. Otherwise, I suspect that in a lot of cases they are not needed. You’ll need to decide what is appropriate for your system.

Dostats also gathers low statistics separately for different indices which takes extra time but in my tests using version 11.50.FC9W2 this didn’t make any difference to the end result.

So what about switching on auto update stats? For this you’ll need to turn on the task scheduler if it’s not running already, which can be done with:

database sysadmin;
execute function task("scheduler start");

I would strongly recommend reviewing the enabled (and/or disabled) tasks and switch off any you don’t want or are not sure about. Review the jobs with:

database sysadmin;
select tk_name, tk_description from ph_task where tk_enable='t';

The relevant jobs for auto update stats are mon_table_profile, Auto Update Statistics Evaluation and Auto Update Statistics Refresh. Most of the others are fairly benign but I disable auto_tune_cpu_vps, add_storage, Low Memory Reconfig and mon_low_storage:

database sysadmin;
update ph_task set tk_enable = 'f' where tk_name in ('auto_tune_cpu_vps', 'add_storage', 'Low Memory Reconfig', 'mon_low_storage');

You’ll also find in table ph_threshold several parameters related to auto update stats:

  • AUS_AGE
  • AUS_PDQ
  • AUS_CHANGE
  • AUS_AUTO_RULES
  • AUS_SMALL_TABLES

Most of these are well-documented in the manual but the explanation of parameter AUS_AUTO_RULES is unclear as it just talks about enforcing a base set of rules. My understanding of the parameter is that:

  • When set to zero, auto update stats just maintains whatever statistics and distributions you have already. This retains any custom resolutions and confidence values you may have.
  • When set to one, it does the above plus it also makes sure that low stats are gathered on all tables, distributions in high mode for all leading index columns and distributions in medium mode for columns that are part of an index but not a leading key.

You can just update the parameters with manual SQL updates on the ph_threshold table. Likewise you’ll need to review and possibly update the scheduled run times for the two auto update statistics tasks in the ph_task table.

By default you just get one process updating stats but it’s possible to have two or more running at the same time by inserting a new row into table ph_task. I’d make sure that the total effective PDQ priority of all these tasks does not exceed 100.

At this point we’re sort of ready to go but you’re now trusting your stats gathering to a new process and I would suggest setting up some kind of monitoring to make sure it’s working as you expect. I feel this is a slight weakness in the implementation because you are back to writing your own scripts. Maybe one answer is to use OAT but this is not a good solution in our environment.

I would suggest monitoring the following:

  • That all three tasks related to auto update stats are enabled and scheduled to run at least once a week.
  • That the db scheduler is running, perhaps by checking for its threads with onstat -g ath.
  • That the values for the various AUS* parameters are sane.
  • That UPDATE STATISTICS LOW was not run too long ago for all tables. If you set AUTO_STAT_MODE it gets a little more complicated because you’ll need to use the information in Keshava Murthy’s blog post, referenced above, to work out whether the table needs to be updated.
  • Something similar for your distributions.
  • For any issues encountered whilst running the UPDATE STATISTICS statements. For this query table aus_command and check for columns where aus_cmd_state is E for error. The SQL error code and ISAM error code will then be in columns aus_cmd_err_sql and aus_cmd_err_isam respectively.

One problem you might find is that your scheduled maintenance times are not long enough to keep pace with how frequently you require your stats to be updated. You can look at adding an extra process or extending the times in this case. Even better, consider reading John Miller’s excellent article on tuning update statistics. It’s now over ten years old but still completely relevant today.

There is a view in the sysadmin database called aus_cmd_comp which shows you all the commands run recently. It gets purged daily so if you want to keep a permanent or longer record you might want to consider writing a procedure to copy its contents elsewhere and creating a scheduled task to call it.

It’s worth noting that auto update stats doesn’t do anything with stored procedure plans, i.e. UPDATE STATISTICS FOR PROCEDURE. If there are open statement handles using procedures, doing so can risk a -710 error unless (and sometimes even if) AUTO_REPREPARE set in your onconfig. Whatever the situation on your system you’ll need to do this manually or by another means.

Finally you might be wondering how the Auto Update Statistics Evaluation task prioritises tables for updating. The answer to this is in the procedures in $INFORMIXDIR/etc/sysadmin/sch_aus.sql.

In summary I like auto update stats and recommend it as long as you have a good understanding of how it works and are aware of the points I’ve raised in this article. It integrates nicely with the new features in Informix 11.70. I like the fact that it is easy to set up, although I do believe you need to monitor it if up to date stats are critical to your system. By not gathering medium-mode distributions on non-indexed columns and not running update statistics low for leading index columns, it does significantly less work than dostats. I appreciate the nice touches it has, like retaining and maintaining the existing resolution of your statistics.

As I said at the start of the article, feedback is welcomed and encouraged.


Prepared statements and the SQL statement cache

Recently I was asked whether I use the SQL statement cache. The answer was no but the more interesting question was why not. When I thought about the reasons why not most of them were out of date or boiled down to a lack of understanding of the finer details on my part.

Thinking about it for a little longer, Informix has always been efficient at parsing or optimising SQL statements and this is an area where it seems to scale without difficulty so I have never had great cause to turn it on. However, Informix must also be using extra CPU cycles for parsing and optimising when a good plan could be ready to go in the cache.

A performance test could be the subject of a future blog post but I want to look at controlling the cache. For example, what happens if a “bad plan” gets cached? How would I get it out of there or force a re-parse and re-optimisation?

As ever, I will be using my trusty sandbox to investigate, currently running Informix 11.70.FC5W1 with STMT_CACHE set to 1 (enabled at session level).

For my tests I set up a log table, the sort of thing that could be used to record user log-ons, with two columns: a varchar for the user name and a date/time column for the log-on time. Both these columns are separately indexed. I will query the table using both columns in the where clause, giving the optimiser the choice of using either index (or a full-scan, in practice not used) to read from the table. The log-on time index is useful for queries intended to show the latest log-ons, independent of the user, but Informix may also be tempted to use it if the expected number of values returned is low and it is deemed to be more selective than the index on the user name. For the queries I am going to run, the user name index will generally be the most efficient but might not always be part of the plan chosen.

I’ll execute the same prepared statement twice with two sets of bind variables. In my tests I want to find out:

  • In the case where the statement cache is off or the statement is not in the cache, what determines the initial query plan?
  • Once the query is cached, what can I do, short of flushing the entire cache, to force a re-parse?

The schema for my test is:

create schema authorization informix
create table logons (
  logon_timestamp datetime year to second,
  username varchar(32)
) extent size 460000 next size 20000 lock mode row;

create index informix.ix_username on informix.logons (username) using btree;
create index informix.ix_logon_timestamp on informix.logons (logon_timestamp) using btree;

I have around 8 million unique rows in this table and high level statistics at resolution 0.5 on both columns.

The Perl code snippet I am going to test with is:

$$dbh->do("SET EXPLAIN ON");
$$dbh->do("SET STATEMENT CACHE OFF"); # Change to ON, as required

my $sql = "SELECT * FROM logons WHERE logon_timestamp BETWEEN ? AND ? AND username=?";
my $sth = $$dbh->prepare($sql);

# Execute prepared statement without any bind variables (forces optimisation). Comment out to test optimisation with bind values.
$sth->execute();

# Execute prepared statement with given bind variables: change as required
$sth->execute('2010-01-11 12:50:50', '2013-01-11 12:50:50', 'BTHOMPSON');
my $count = 0;
while (my @columns = $sth->fetchrow_array) {
    print "$columns[0]\n";
    $count++;
    last if $count == 10;
}

# Execute prepared statement with given bind variables: change as required
$sth->execute('2013-01-11 12:50:50', '2013-01-11 12:50:50', 'BTHOMPSON');
$count = 0;
while (my @columns = $sth->fetchrow_array) {
    print "$columns[0]\n";
    $count++;
    last if $count == 10;
}

$sth->finish();

The key thing about the code is that the statement uses bind variables and is prepared only once.

Some initial results are (with the statement cache off):

Lower timestamp bind value Upper timestamp bind value User name bind value Index used
2013-01-11 12:50:50 2013-01-11 12:50:50 BTHOMPSON ix_logon_timestamp
2010-01-11 12:50:50 2013-01-11 12:50:50 BTHOMPSON ix_username
<None> <None> BTHOMPSON ix_logon_timestamp

So far, so fiddled. But one interesting result has already dropped out. If I prepare and optimise the statement without any bind values I fix the plan. I achieve this in Perl by using the $sth->execute function but I don’t provide any bind values and don’t fetch any rows. When I re-execute the statement with bind variables I find the statement has already been optimised and subsequent executions will use the same plan. I had expected that I would have to supply some real bind variables but this appears to be the case even with no bind variables initially supplied. I am not sure what this means in practice, since you probably not do this in your code, but it is an interesting result nonetheless. It is certainly not the same as binding blank values or nulls and Informix will generate an explain plan for the query the first time $sth->execute is called.

Let’s switch on the statement cache (SET STATEMENT CACHE ON) and see if there are any differences. Well, there are not the first time the script is run but subsequent runs with different settings will re-use the initial plan. We need to flush the cache with onmode -e flush each time to force the plan to be reparsed.

We can see that there is now a statement cache entry with onstat -g ssc:

> onstat -g ssc

IBM Informix Dynamic Server Version 11.70.FC5W1 -- On-Line -- Up 6 days 02:45:27 -- 1224528 Kbytes

Statement Cache Summary:
#lrus   currsize             maxsize              Poolsize             #hits   nolimit 
4       24440                524288               40960                0       0       

Statement Cache Entries: 

lru hash ref_cnt hits flag heap_ptr      database           user
--------------------------------------------------------------------------------
  0   51       0    0   -F 83df8038      optimiser_test     thompsonb
  SELECT * FROM logons WHERE logon_timestamp BETWEEN ? AND ? AND username=?



    Total number of entries: 1.

As I mentioned before, in this example using the plan using the index on logon_timestamp is generally a poor choice for anything other than the smallest time ranges. As a common plan for this query, the index on user name would be the best. So what am I to do when the statement cache is on and the first used bind values caused the optimiser to settle on the index on logon_timestamp?

onmode -e flush is going to work but it’s a bit of a sledge hammer and might mean I have to watch for other queries being re-optimised badly. Another alternative is to perform some DDL on (one of) the table(s) in the query. This still affects all queries using that table but is more targetted than flushing the cache. A trick I have learned from Oracle, where this is a common problem as all statements are cached in the shared pool, and which also works with Informix is to perform a grant or revoke as the DDL statement, e.g.:

grant select on informix.logons to thompsonb as informix;

If done carefully, you can grant a privilege that is not needed and then revoke it again immediately afterwards.

One good thing is that when you prepare and then execute the query for which there is already a plan in the statement cache, the explain output will show you the cached plan. One clue that it is a cached plan and has not been reprepared seems to be that the estimated cost is shown as zero.

As a result of these tests, I now feel better equipped to investigate and deal with query performance issues on instances where the instance cache is switched on globally or enabled at session level.


SELECT… FOR UPDATE

I’m a DBA and not really a developer so I don’t often get involved in writing SPL or complex code unless it’s for a monitoring or maintenance script.

Recently I was asked for some help with the SELECT… FOR UPDATE construct, the end result of which made me realise that I didn’t understand the effects of using this. If you’re an experienced Informix person, this is another one of my “so-what” posts as I am not covering anything new. However, this feature is not brilliantly documented in the manual and so I think it’s worth covering.

I had thought or assumed that SELECT… FOR UPDATE would select the rows and place exclusive locks on them, preventing another session updating them before I did. This is the functionality I wanted but this is not what it does.

So what does it do? Luckily on my desk is a dusty copy of Managing and Optimizing Dynamic Server 2000 Databases, left by a previous DBA and looking like the handbook given out at a training course. This holds some of the answers. And there is no better way of testing what’s in there than by firing up my DBA sandbox.

I begin by creating table on which I’m going to perform some updates and use multiple sessions to see the effect of locking.

create table tobeupdated (
col1 int not null,
col2 int not null
);

insert into tobeupdated values (1,1);
insert into tobeupdated values (2,1);
insert into tobeupdated values (3,1);
insert into tobeupdated values (4,1);
insert into tobeupdated values (5,1);
insert into tobeupdated values (6,1);

create unique index ix_tobeupdated on tobeupdated (col1);
alter table tobeupdated add constraint primary key (col1) constraint pk_tobeupdated;

For reasons that will become clear soon, I should mention that I am using a logged non-ANSI database and therefore the default isolation mode is committed read.

To use SELECT… FOR UPDATE we must explicitly begin a transaction so let’s run the following:

> begin work;

Started transaction.

> select * from tobeupdated where col1=1 for update;

col1 col2

1 1

1 row(s) retrieved.

And now I will check what locks this has placed with onstat -k. From querying systables and sysfragments I know the part numbers in hexadecimal for my table is 0x00800654 and its unique index is 0x00900D40. I’ve removed all irrelevant output:

Locks
address wtlist owner lklist type tblsnum rowid key#/bsiz
44ec7d98 0 7d448418 44f0b298 HDR+IX 800654 0 0

This was my first surprise when investigating it initially: there is no exclusive row lock; I have just an intended lock on the table.

So what can other sessions do to the row I have just selected for update? It seems they can select it AND update it:

> select * from tobeupdated where col1=1;

col1 col2

1 1

1 row(s) retrieved.

> update tobeupdated set col2=2 where col1=1;

1 row(s) updated.

Where does this leave my original session, still with its open transaction?

> select * from tobeupdated where col1=1 for update;

col1 col2

1 2

1 row(s) retrieved.

Arrgghh! My row has been updated underneath me. If I were to update it now, the original session would lose its update.

I thought about using repeatable read isolation to keep the row lock after the select. That dusty old training manual to the rescue… It seems I was thinking along the right lines but there is no need to go that far. You can also issue the statements:

SET ISOLATION TO DIRTY READ RETAIN UPDATE LOCKS;
SET ISOLATION TO COMMITTED READ RETAIN UPDATE LOCKS;
SET ISOLATION TO CURSOR STABILITY RETAIN UPDATE LOCKS;

The training manual states:

“The RETAIN UPDATE LOCKS feature is a switch which can be turned on and off at any time during a user connection to the server. It only effects SELECT… FOR UPDATE statements with isolation levels DIRTY READ, COMMITTED READ and CURSOR STABILITY.

“When the UPDATE LOCK has been placed on a row during a FETCH of a SELECT… FOR UPDATE with one of the above isolation levels it is not released at the subsequent FETCH or when the cursor is closed. The UPDATE LOCK is retained until the end of the transaction. This feature lets the user avoid the overhead of Repeatable Read isolation level or work arounds such as dummy updates on a row.”

Let’s see what effect one of these has by running through the same process again:

> begin work;

Started transaction.

> set isolation to committed read retain update locks;

Isolation level set.

> select * from tobeupdated where col1=5 for update;

col1 col2

5 1

1 row(s) retrieved.

Running onstat -k shows I have an extra lock that was not there before:

Locks
address wtlist owner lklist type tblsnum rowid key#/bsiz
44ec7d18 0 7d44b5c8 44f0b298 HDR+U 800654 105 0
44f0b298 0 7d44b5c8 44f0cd98 HDR+IX 800654 0 0

It looks promising because the lock is on a specific row. The HDR+U lock is a special promotable lock used for rows retrieved for update but are not yet updated. It prevents anyone else from placing an exclusive or promotable lock on the object.

A quick check with a second session shows that we can’t place an exclusive lock to update the row:

> update tobeupdated set col2=10 where col1=5;

244: Could not do a physical-order read to fetch next row.

107: ISAM error: record is locked.
Error in line 1
Near character position 42

We can, however, select from the table in the second session showing that the promotable lock is just that: a shared lock that can be promoted to an exclusive lock.

> select * from tobeupdated where col1=5;

col1 col2

5 1

1 row(s) retrieved.

But we can’t perform a second SELECT… FOR UPDATE in the second session until the first session commits or rolls back:

> begin work;

Started transaction.

> select * from tobeupdated where col1=5 for update;

col1 col2

244: Could not do a physical-order read to fetch next row.

107: ISAM error: record is locked.
Error in line 1
Near character position 48

So in summary, I nearly got what I wanted or expected in the first place when I used RETAIN UPDATE LOCKS. However, there appears to be no way of placing an exclusive lock on the row before the update, preventing another session from selecting it unless the session is well-behaved and uses SELECT.. FOR UPDATE as well. It looks like dummy updates could still be needed to prevent any code doing a normal SELECT and then a subsequent update.

I am struggling to think of a situation where SELECT… FOR UPDATE is useful without either RETAIN UPDATE LOCKS or repeatable read isolation. Perhaps someone could suggest a scenario in the comments on my WordPress blog?

This was an interesting thing to investigate on a Friday afternoon and I hope it is useful to someone. To avoid anyone trying to debug a stored procedure that is never going to work, it’s worth adding a note that SELECT… FOR UPDATE cannot be used in SPL, at least not directly. You need to look at using UPDATE… WHERE CURRENT instead.