Foreign key creation performance

The Agile development model IBM has moved to for Informix development means that there’s always a few extra features in each fix pack. There are two rather nice enhancement in 11.70.FC8 when building a foreign key which alleviate some pain points when doing administration on large tables. The first is a no validate option which means that no checking of the data is done when creating the foreign key. This is appropriate in situations like migrating platforms when you know that referential integrity is already assured and you want to avoid the overhead of re-checking it all again on the new system. I want to look instead at the other enhancement: improved performance when creating a foreign key.

Nearly all of the time creating a foreign key constraint is spent checking the data; modifying the database schema to add the constraint is a very quick operation.

Prior to 11.70.FC8 creating a foreign key on a large table was a long process and with non-partitioned tables it was a single-threaded operation which did not benefit from the use of indices. Typically you’d see a thread working away like this:

tid      name     rstcb            flags    curstk   status
283      sqlexec  26e1fc3c78       --BPR--  22096    running-

Or with a fragmented table and PDQ turned on you might see multiple threads:

tid      name     rstcb            flags    curstk   status
324      sqlexec  26e1fc3c78       --BP---  19744    sleeping secs: 1     -
325      scan_1.0 26e1fc0a98       --B-R--  3248     running-
326      scan_1.1 26e1fc44c8       --B-R--  3248     running-
327      scan_1.2 26e1fc4d18       --B-R--  3248     running-
328      scan_1.3 26e1fc5568       --B-R--  3248     running-
329      scan_1.4 26e1fc5db8       --B-R--  3248     running-
330      scan_1.5 26e1fc6608       --B-R--  3248     running-
331      scan_1.6 26e1fc97e8       --B-R--  3248     running-
332      scan_1.7 26e1fcb0d8       --B-R--  3248     running-

Even with parallel operations foreign key builds could still take a long time.

With 11.70.FC8 foreign key builds are a lot faster and here is a test case which you can try. I’m going to create a 10 million row table, which is large enough to demonstrate the problem, and a referenced table with up to 32767 unique values and see how long it takes to create a foreign key.

Courtesy of the Informix FAQ I am going to create these procedures to generate random numbers.

CREATE PROCEDURE sp_setseed(n INTEGER)
  DEFINE GLOBAL seed DECIMAL(10) DEFAULT 1;

  LET seed = n;

END PROCEDURE;

CREATE PROCEDURE sp_random() RETURNING INTEGER;
  DEFINE GLOBAL seed DECIMAL(10) DEFAULT 1;
  DEFINE d DECIMAL(20,0);

  LET d = (seed * 1103515245) + 12345;
  -- MOD function does not handle 20-digit values
  LET seed = d - 4294967296 * TRUNC(d / 4294967296);

  RETURN MOD(TRUNC(seed / 65536), 32768);

END PROCEDURE;

These will generate a random number between 0 and 32767 and I’ll use these to create some data to populate my tables with.

drop procedure if exists populate;
create procedure populate ()

  define p_wid int;
  define i int;

  drop table if exists referenced;
  create table referenced (
    widget_id int not null
  );

  drop table if exists referencing;
  create table referencing (
    id_column bigserial not null,
    widget_id int not null
  );

  FOR i = 1 TO 10000000 STEP 1

    let p_wid = sp_random();
    insert into referencing (widget_id) values (p_wid);

  END FOR;

  create index ireferencing_x1 on referencing (widget_id);
  insert into referenced (widget_id) select distinct widget_id from referencing;

  create unique index ireferenced_u1 on referenced (widget_id);
  alter table referenced add constraint primary key (widget_id) constraint referenced_pk;

end procedure;

execute procedure populate();

That sets up the test. It just remains to create the foreign key, which I’ll time:

alter table referencing add constraint foreign key (widget_id) references referenced (widget_id) constraint referencing_fk;

And so for the results:

Informix version Fragmented table PDQ priority Parallel execution Foreign key build time (s)
11.70.FC7 No 0 No 40
11.70.FC7 No 100 No 40
11.70.FC7 Yes, 8-way partition by expression, evenly distributed 0 No 40
11.70.FC7 Yes, 8-way partition by expression, evenly distributed 100 Yes 40
11.70.FC7 Yes, 8-way round-robin 0 No 41
11.70.FC7 Yes, 8-way round-robin 100 Yes 67
11.70.FC8 No 0 No 0.3
11.70.FC8 No 100 No 0.3
11.70.FC8 Yes, 8-way partition by expression, evenly distributed 0 No 0.5
11.70.FC8 Yes, 8-way partition by expression, evenly distributed 100 Yes 40
11.70.FC8 Yes, 8-way round-robin 0 No 0.9
11.70.FC8 Yes, 8-way round-robin 100 Yes 68

And for comparison here is the time taken for some related SQL queries (without PDQ):

Query Duration (s)
select distinct widget_id from referencing; 9
select widget_id, count(*) from referencing group by widget_id; 7
select widget_id from referencing where widget_id not in (select widget_id from referenced); 37

The purpose of this table is to show the relative times of the foreign key build and not to benchmark any hardware or Informix itself so I’m not going to divulge what hardware I’m used for the tests. The server I used did have multiple processors and CPU VPs configured so that PDQ queries could scale out. The number of ready threads was very low or none during the operation.

I ran all the tests a few times and there was a small amount of variation so I’ve only used a low precision in the results.

So what conclusions can we draw from this?

  • The stand-out result is that the verification of foreign keys is much faster in FC8 but only for serial operations.
  • Serial foreign key builds in FC8 are significantly faster than any SQL query I could write that checked the referential integrity.
  • In FC7 and earlier parallel operations may be slower.
  • In FC8 there is not the same speed improvement for parallel operations and they are still much slower.

Another interesting finding is the large difference in performance between the expression-based fragmentation and round-robin fragmentation when using parallel operations. I had assumed that this kind of operation would be agnostic to the type of fragmentation used. I checked this a few times but examining the threads when it is running shows something different is definitely happening.

Round-robin fragmentation looks like the below. Notice that most of the threads are active:

tid      name     rstcb            flags    curstk   status
123535   sqlexec  15f2599e8        --BP---  29472    sleeping secs: 1     -
123536   scan_1.0 15f25e590        --B-R--  2400     running-
123537   scan_1.1 15f262068        Y-B----  496      cond wait  await_MC1 -
123538   scan_1.2 15f25c3f0        Y-B----  496      cond wait  await_MC1 -
123539   scan_1.3 15f261800        --B----  2400     running-
123540   scan_1.4 15f2628d0        --B-R--  2400     running-
123541   scan_1.5 15f263138        --B-R--  2400     running-
123542   scan_1.6 15f25edf8        --B-R--  2400     running-
123543   scan_1.7 15f25f660        --B----  2400     running-

Expression fragmentation seems to run with just one active thread for the majority of the time:

tid      name     rstcb            flags    curstk   status
123804   sqlexec  15f258918        --BP---  29472    sleeping secs: 1     -
123805   scan_1.0 15f25e590        --B-R--  2400     running-
123806   scan_1.1 15f25c3f0        Y-B----  496      cond wait  await_MC1 -
123807   scan_1.2 15f261800        Y-B----  496      cond wait  await_MC1 -
123808   scan_1.3 15f2628d0        Y-B----  496      cond wait  await_MC1 -
123809   scan_1.4 15f263138        Y-B----  496      cond wait  await_MC1 -
123810   scan_1.5 15f25edf8        Y-B----  496      cond wait  await_MC1 -
123811   scan_1.6 15f25f660        Y-B----  496      cond wait  await_MC1 -
123812   scan_1.7 15f2639a0        Y-B----  496      cond wait  await_MC1 -
123813   scan_1.8 15f260f98        Y-B----  496      cond wait  await_MC1 -

There is an extra scan thread (scan_1.8) because I have an (empty) remainder fragment in addition to my 8-way fragmentation.

This would go some way to explaining why round-robin is faster.

Finally, if I could wander into the realms of speculation for a moment, it does look like IBM has implemented the parallel foreign key constraint checking slightly differently for each fragmentation method, which might explain why the improved performance is only seen with serial operations.

To sum up, this is a wonderful enhancement but I’ll be careful to switch off parallel processing when building foreign keys. It greatly speeds up index builds on fragmented tables, which is the type of operation you might do at the same time so I can foresee scripts with parallel processing being enabled and disabled several times to ensure best speed.

Advertisements

Monitoring virtual segment usage and the CPU VP caches

A little while ago I was forced to look into detail into the memory usage of one of the production instances I look after. Specifically the problem was that the instance was allocating extra virtual segments to itself (via the SHMADD process) but it was a surprise because memory usage on this instance was being monitored and it clearly showed that memory usage was normal and well below the initial segment size.

Well, almost everything did. onstat -g seg was correctly reporting the memory usage. Without wishing to put in an early spoiler, this is the only way of seeing how much memory your system is using and how close you are to your instance allocating an extra segment.

With the help of a careful analysis by IBM Informix support, we looked into memory usage on our system using:

onstat -u
onstat -g mem
onstat -g mgm
onstat -g ses

We also drilled down into individual sessions using:

onstat -g afr
onstat -g ffr
onstat -g ufr

We also looked into memory usage by onbar and SQLTRACE.

The result was a massive discrepancy between the total memory in use, as reported by these tools, and what onstat -g seg was reporting. And onstat -g seg appeared to be right because when it said the memory was all used up, the engine would allocate another segment.

So where was the memory going? Was it a leak or a bug? Well no, it was a new feature or, as we later learned, a bug fix.

In response to a performance issue reported by another customer, IBM had redesigned the VP memory cache code in 11.70.FC7W1 and it turned out that this was responsible for the usage of up to 70% of the virtual memory segment on our system. This is an enterprise edition only feature so I guess if you any other edition, you can stop reading at this point and just note that monitoring the output from onstat -g seg is a great idea if you want to predict segment additions.

The CPU memory cache allocates a dedicated area of the virtual memory segment to each CPU virtual processor, which it can use exclusively without contending with other virtual processors.

So a new and undocumented feature in a W1 release? Isn’t this a bit irregular? Well, no said IBM, it was a fix for a customer issue. But it does change the behaviour from what is documented quite dramatically.

In 11.70.FC7 and before, the CPU memory cache size is controlled by the VP_MEMORY_CACHE_KB parameter and if you have, say 8 CPU VPs, this results in a fixed area of memory of 8x VP_MEMORY_CACHE_KB being allocated to CPU memory caches and this is still how the manual says it works.

In 11.70.FC7W1 this parameter merely controls the initial size of these caches, which are then free to grow (and I think shrink below the initial size) as they see fit. To improve performance memory can be allocated to these caches without having to free any first and a separate thread deals with garbage collection (or drainage). (I hope I have explained this properly as I an not a programmer.) What is certain is that if your system is very busy the caches grow faster than the garbage collection clears them down. If your system is very busy for a sustained period, they can grow and allocate memory until you hit SHMTOTAL, if you’ve set it. (I don’t think hitting this limit would be very pretty because the instance would kick out sessions to free up memory, but this is not where the problem lies. Anyway, it would need testing and I haven’t done so.)

So can you monitor it? Yes you can and I’d recommend if you’re running 11.70.FC7W1 or above and have the VP cache switched on that you do. This little code snippet does the job of calculating the total size in Mb of all the VP caches for an instance:

#!/bin/bash
vpcacheblks=0
vpcache=(`onstat -g vpcache | grep -E '^    [0-9]' | awk '{ print $2 }'`)
for ((i=0; i<${#vpcache[*]}; i++)); do
    vpcacheblks=`expr $vpcacheblks + ${vpcache[$i]}`
done
vpcachemb=`echo 'scale=1; '${vpcacheblks}' / 256' | bc`
echo $vpcachemb

You can also use the output from onstat -g vpcache to work out at the number of missed drains using the formula (free - alloc) - drains.

If you have a busy system and particularly one with heavy peak periods, graphing the size of this over time is very interesting. Equally if your system is not that busy, you may see flat memory usage. It’s worth knowing which applies to you.

So if you’re reading this article and, having done a bit of investigation on your own system to see whether it affects you and found that it does, what can you do to mitigate? Here are some options:

  • Downgrade to 11.70.FC7 or earlier.
  • Set VP_MEMORY_CACHE_KB to 0. You can actually do this dynamically using onmode -wm to clear out the cache and then reset it to its original value immediately afterwards.
  • Increase SHMVIRTSIZE to accommodate the growth. Of course you need spare memory in your server to do this.
  • Set SHMTOTAL to remove the possibility of your server swapping. If you do, also look at setting up the low memory manager.

So what are IBM doing about the situation? An APAR has been raised as follows:

IC95684 AFTER THE FIX FOR IC90645 (11.70.FC7W1 AND NEWER) THE VP PRIVATE CACHES CAN GROW UNCONTROLLABLY ON BUSY SYSTEMS

This should result (in 11.70.FC8) in the documentation being updated and a choice of STATIC or DYNAMIC modes will be available for the VP caches. DYNAMIC will be the same as the new behaviour and STATIC is more similar to how things were previously where the VP caches were a fixed size. Note I said more similar and not the same. It will be interesting to look at how this behaves when it’s available.

There’s also another issue touched on here and I’ve used the new request for enhancement (RFE) site to log it and that is that onstat -g mem does not include the VP cache sizes in its output and is therefore not a complete view of all the memory pools in your instance. The RFE requests that it is.


Working with auto update stats

This article is superseded by my more comprehensive post, Experience with Auto Update Statistics (AUS).

This article has been written based on experience with version 11.70.FC5W1 and assumes some knowledge of stats and distributions and how they affect query optimisation. I appreciate any feedback in the comments section on my WordPress blog.

Auto update stats was introduced in 11.50.xC1 and, while being partly aimed at the embedded market, meant for the first time there was a complete solution to gathering database statistics bundled inside the product.

Several other tools exist to help with gathering statistics, for example AGS’s Server Studio can produce a set of UPDATE STATISTICS commands in a script to run against your database. Anecdotally, most DBAs use Art Kagel’s dostats utility, packaged up in the utils2_ak package available from the IIUG software repository. Dostats is pretty damn good although it’s not a complete solution as some scripts are needed to control it. It comes with a partner utility, drive_dostats, to do this but many DBAs, including myself, have written their own. Because dostats is the de-facto standard, I’ll refer to it a fair bit in this article. Also version 11.70 has a number of enhancements that don’t require you to use auto update stats; I’ll cover these as well.

So if you’re happily using dostats or another method to manage statistics, should you consider changing to auto update stats? Should it be your method of choice for a new-build instance? Well maybe: this article will go through some of the advantages and things to be aware of.

Here are some of its advantages:

  • The whole solution is part of the database engine and supported by IBM support.
  • It provides a complete framework, working within defined maintenance windows and is highly configurable.
  • It can be managed through OAT, although this is not needed.
  • Auto update stats does less work and takes less time than many solutions because it does not (by default) gather distributions on non-indexed columns or do separate low stats for each entire index key.
  • It’s fully integrated with the enhancements to update stats introduced in version 11.70.
  • It works on all your databases, including the system ones.
  • Perhaps my favourite feature: if you make manual adjustments, like increasing the resolution of the distributions on a column, auto update stats notices this and maintains the distributions at the new resolution. Similarly, if you manually create a distribution on a column it will maintain this.

One reason some DBAs don’t use auto update stats is because it involves using the job scheduler, which I’m told had issues in early releases with high CPU usage. For this reason, many DBAs touch a file called $INFORMIXDIR/etc/sysadmin/stop to stop it starting when the engine comes online. With 11.70.FC5W1 we run the job scheduler without any issues. (As an aside, if you’re not using it, it’s worth looking at the jobs to see what you’re missing: post_alarm_message is particularly useful.)

So can you just enable the job scheduler and let auto update stats do its thing? Not really. The first thing to look at is these onconfig parameters, which are in 11.70 and take effect regardless of the statistics method used:

  • AUTO_STAT_MODE
  • STATCHANGE
  • USTLOW_SAMPLE

Using auto stat mode and a non-zero value for STATCHANGE is something you need to consider very carefully. Internally the engine keeps a count of the number of inserts, updates and deletes that occur on each table, something that Keshava Murthy covered in his blog. If these collectively do not exceed STATCHANGE percent of the row count, statistics or distributions are not updated. This applies even when you run an UPDATE STATISTICS command manually. Confusingly the command still returns ‘Statistics updated’ even when nothing is done; the only clue is that the prompt returns instantly. To get around this there is a new FORCE keyword for the UPDATE STATISTICS statement that reverts to the old behaviour.

I find turning AUTO_STAT_MODE on and setting STATCHANGE to zero works quite well: this just skips tables where no updates, inserts or deletes have occurred.

You can set the value of STATCHANGE manually on individual tables with a fast-alter operation:

alter table table statchange change_threshold;

As it’s an update to systables, be aware an exclusive table lock is briefly needed.

I’m not keen on setting STATCHANGE to non-zero value because we have a lot of tables with incrementing date/time fields, meaning query optimisation is often time-based. I would find the option to override the age-based AUS_AGE parameter on a per-table basis much more useful, something that can only be done by writing your own script. Fortunately, as auto update stats evaluates the tables with stale stats on a regular scheduled basis, any ad-hoc updates are taken into account in its scheduling.

Setting USTLOW_SAMPLE enables sampling for UPDATE STATISTICS LOW statements, which is generally a good thing and can dramatically the time these statements take. It can be overridden in your user environment. Sampling generally works well as long as the table is not heavily skewed in some way: if Informix thinks it is you’ll see messages like this in the online log:

Warning: update statistics low using sampling may generate inaccurate index statistics for index owner.index_name due to data skew

Whether this is an issue for you will depend very much on your queries.

The other major enhancement in 11.70 is fragment-level statistics but I shan’t cover in detail here. If your storage schema is compatible with it and your table access patterns mean that some table fragments are never updated, it’s extremely useful. Informix’s implementation is nice in that the table stats are still considered as a whole when the optimiser evaluates queries, so you don’t get into trouble with having no stats for new fragments.

Perhaps the most significant difference between dostats and auto update stats is that dostats gathers additional distributions on non-indexed columns using UPDATE STATISTICS MEDIUM. If you have such distributions already auto update stats will continue to maintain them but it won’t create any new ones. All automated tools are attempting to apply a set of general criteria and recommendations to all tables so there is no hard and fast rule about whether you need them. One case to consider is where you have some sort of status flag such as a boolean or a column with a limited set of allowed values, perhaps enforced by a check constraint. Here distributions could be useful where these columns are used as filter conditions in queries. Otherwise, I suspect that in a lot of cases they are not needed. You’ll need to decide what is appropriate for your system.

Dostats also gathers low statistics separately for different indices which takes extra time but in my tests using version 11.50.FC9W2 this didn’t make any difference to the end result.

So what about switching on auto update stats? For this you’ll need to turn on the task scheduler if it’s not running already, which can be done with:

database sysadmin;
execute function task("scheduler start");

I would strongly recommend reviewing the enabled (and/or disabled) tasks and switch off any you don’t want or are not sure about. Review the jobs with:

database sysadmin;
select tk_name, tk_description from ph_task where tk_enable='t';

The relevant jobs for auto update stats are mon_table_profile, Auto Update Statistics Evaluation and Auto Update Statistics Refresh. Most of the others are fairly benign but I disable auto_tune_cpu_vps, add_storage, Low Memory Reconfig and mon_low_storage:

database sysadmin;
update ph_task set tk_enable = 'f' where tk_name in ('auto_tune_cpu_vps', 'add_storage', 'Low Memory Reconfig', 'mon_low_storage');

You’ll also find in table ph_threshold several parameters related to auto update stats:

  • AUS_AGE
  • AUS_PDQ
  • AUS_CHANGE
  • AUS_AUTO_RULES
  • AUS_SMALL_TABLES

Most of these are well-documented in the manual but the explanation of parameter AUS_AUTO_RULES is unclear as it just talks about enforcing a base set of rules. My understanding of the parameter is that:

  • When set to zero, auto update stats just maintains whatever statistics and distributions you have already. This retains any custom resolutions and confidence values you may have.
  • When set to one, it does the above plus it also makes sure that low stats are gathered on all tables, distributions in high mode for all leading index columns and distributions in medium mode for columns that are part of an index but not a leading key.

You can just update the parameters with manual SQL updates on the ph_threshold table. Likewise you’ll need to review and possibly update the scheduled run times for the two auto update statistics tasks in the ph_task table.

By default you just get one process updating stats but it’s possible to have two or more running at the same time by inserting a new row into table ph_task. I’d make sure that the total effective PDQ priority of all these tasks does not exceed 100.

At this point we’re sort of ready to go but you’re now trusting your stats gathering to a new process and I would suggest setting up some kind of monitoring to make sure it’s working as you expect. I feel this is a slight weakness in the implementation because you are back to writing your own scripts. Maybe one answer is to use OAT but this is not a good solution in our environment.

I would suggest monitoring the following:

  • That all three tasks related to auto update stats are enabled and scheduled to run at least once a week.
  • That the db scheduler is running, perhaps by checking for its threads with onstat -g ath.
  • That the values for the various AUS* parameters are sane.
  • That UPDATE STATISTICS LOW was not run too long ago for all tables. If you set AUTO_STAT_MODE it gets a little more complicated because you’ll need to use the information in Keshava Murthy’s blog post, referenced above, to work out whether the table needs to be updated.
  • Something similar for your distributions.
  • For any issues encountered whilst running the UPDATE STATISTICS statements. For this query table aus_command and check for columns where aus_cmd_state is E for error. The SQL error code and ISAM error code will then be in columns aus_cmd_err_sql and aus_cmd_err_isam respectively.

One problem you might find is that your scheduled maintenance times are not long enough to keep pace with how frequently you require your stats to be updated. You can look at adding an extra process or extending the times in this case. Even better, consider reading John Miller’s excellent article on tuning update statistics. It’s now over ten years old but still completely relevant today.

There is a view in the sysadmin database called aus_cmd_comp which shows you all the commands run recently. It gets purged daily so if you want to keep a permanent or longer record you might want to consider writing a procedure to copy its contents elsewhere and creating a scheduled task to call it.

It’s worth noting that auto update stats doesn’t do anything with stored procedure plans, i.e. UPDATE STATISTICS FOR PROCEDURE. If there are open statement handles using procedures, doing so can risk a -710 error unless (and sometimes even if) AUTO_REPREPARE set in your onconfig. Whatever the situation on your system you’ll need to do this manually or by another means.

Finally you might be wondering how the Auto Update Statistics Evaluation task prioritises tables for updating. The answer to this is in the procedures in $INFORMIXDIR/etc/sysadmin/sch_aus.sql.

In summary I like auto update stats and recommend it as long as you have a good understanding of how it works and are aware of the points I’ve raised in this article. It integrates nicely with the new features in Informix 11.70. I like the fact that it is easy to set up, although I do believe you need to monitor it if up to date stats are critical to your system. By not gathering medium-mode distributions on non-indexed columns and not running update statistics low for leading index columns, it does significantly less work than dostats. I appreciate the nice touches it has, like retaining and maintaining the existing resolution of your statistics.

As I said at the start of the article, feedback is welcomed and encouraged.


Prepared statements and the SQL statement cache

Recently I was asked whether I use the SQL statement cache. The answer was no but the more interesting question was why not. When I thought about the reasons why not most of them were out of date or boiled down to a lack of understanding of the finer details on my part.

Thinking about it for a little longer, Informix has always been efficient at parsing or optimising SQL statements and this is an area where it seems to scale without difficulty so I have never had great cause to turn it on. However, Informix must also be using extra CPU cycles for parsing and optimising when a good plan could be ready to go in the cache.

A performance test could be the subject of a future blog post but I want to look at controlling the cache. For example, what happens if a “bad plan” gets cached? How would I get it out of there or force a re-parse and re-optimisation?

As ever, I will be using my trusty sandbox to investigate, currently running Informix 11.70.FC5W1 with STMT_CACHE set to 1 (enabled at session level).

For my tests I set up a log table, the sort of thing that could be used to record user log-ons, with two columns: a varchar for the user name and a date/time column for the log-on time. Both these columns are separately indexed. I will query the table using both columns in the where clause, giving the optimiser the choice of using either index (or a full-scan, in practice not used) to read from the table. The log-on time index is useful for queries intended to show the latest log-ons, independent of the user, but Informix may also be tempted to use it if the expected number of values returned is low and it is deemed to be more selective than the index on the user name. For the queries I am going to run, the user name index will generally be the most efficient but might not always be part of the plan chosen.

I’ll execute the same prepared statement twice with two sets of bind variables. In my tests I want to find out:

  • In the case where the statement cache is off or the statement is not in the cache, what determines the initial query plan?
  • Once the query is cached, what can I do, short of flushing the entire cache, to force a re-parse?

The schema for my test is:

create schema authorization informix
create table logons (
  logon_timestamp datetime year to second,
  username varchar(32)
) extent size 460000 next size 20000 lock mode row;

create index informix.ix_username on informix.logons (username) using btree;
create index informix.ix_logon_timestamp on informix.logons (logon_timestamp) using btree;

I have around 8 million unique rows in this table and high level statistics at resolution 0.5 on both columns.

The Perl code snippet I am going to test with is:

$$dbh->do("SET EXPLAIN ON");
$$dbh->do("SET STATEMENT CACHE OFF"); # Change to ON, as required

my $sql = "SELECT * FROM logons WHERE logon_timestamp BETWEEN ? AND ? AND username=?";
my $sth = $$dbh->prepare($sql);

# Execute prepared statement without any bind variables (forces optimisation). Comment out to test optimisation with bind values.
$sth->execute();

# Execute prepared statement with given bind variables: change as required
$sth->execute('2010-01-11 12:50:50', '2013-01-11 12:50:50', 'BTHOMPSON');
my $count = 0;
while (my @columns = $sth->fetchrow_array) {
    print "$columns[0]\n";
    $count++;
    last if $count == 10;
}

# Execute prepared statement with given bind variables: change as required
$sth->execute('2013-01-11 12:50:50', '2013-01-11 12:50:50', 'BTHOMPSON');
$count = 0;
while (my @columns = $sth->fetchrow_array) {
    print "$columns[0]\n";
    $count++;
    last if $count == 10;
}

$sth->finish();

The key thing about the code is that the statement uses bind variables and is prepared only once.

Some initial results are (with the statement cache off):

Lower timestamp bind value Upper timestamp bind value User name bind value Index used
2013-01-11 12:50:50 2013-01-11 12:50:50 BTHOMPSON ix_logon_timestamp
2010-01-11 12:50:50 2013-01-11 12:50:50 BTHOMPSON ix_username
<None> <None> BTHOMPSON ix_logon_timestamp

So far, so fiddled. But one interesting result has already dropped out. If I prepare and optimise the statement without any bind values I fix the plan. I achieve this in Perl by using the $sth->execute function but I don’t provide any bind values and don’t fetch any rows. When I re-execute the statement with bind variables I find the statement has already been optimised and subsequent executions will use the same plan. I had expected that I would have to supply some real bind variables but this appears to be the case even with no bind variables initially supplied. I am not sure what this means in practice, since you probably not do this in your code, but it is an interesting result nonetheless. It is certainly not the same as binding blank values or nulls and Informix will generate an explain plan for the query the first time $sth->execute is called.

Let’s switch on the statement cache (SET STATEMENT CACHE ON) and see if there are any differences. Well, there are not the first time the script is run but subsequent runs with different settings will re-use the initial plan. We need to flush the cache with onmode -e flush each time to force the plan to be reparsed.

We can see that there is now a statement cache entry with onstat -g ssc:

> onstat -g ssc

IBM Informix Dynamic Server Version 11.70.FC5W1 -- On-Line -- Up 6 days 02:45:27 -- 1224528 Kbytes

Statement Cache Summary:
#lrus   currsize             maxsize              Poolsize             #hits   nolimit 
4       24440                524288               40960                0       0       

Statement Cache Entries: 

lru hash ref_cnt hits flag heap_ptr      database           user
--------------------------------------------------------------------------------
  0   51       0    0   -F 83df8038      optimiser_test     thompsonb
  SELECT * FROM logons WHERE logon_timestamp BETWEEN ? AND ? AND username=?



    Total number of entries: 1.

As I mentioned before, in this example using the plan using the index on logon_timestamp is generally a poor choice for anything other than the smallest time ranges. As a common plan for this query, the index on user name would be the best. So what am I to do when the statement cache is on and the first used bind values caused the optimiser to settle on the index on logon_timestamp?

onmode -e flush is going to work but it’s a bit of a sledge hammer and might mean I have to watch for other queries being re-optimised badly. Another alternative is to perform some DDL on (one of) the table(s) in the query. This still affects all queries using that table but is more targetted than flushing the cache. A trick I have learned from Oracle, where this is a common problem as all statements are cached in the shared pool, and which also works with Informix is to perform a grant or revoke as the DDL statement, e.g.:

grant select on informix.logons to thompsonb as informix;

If done carefully, you can grant a privilege that is not needed and then revoke it again immediately afterwards.

One good thing is that when you prepare and then execute the query for which there is already a plan in the statement cache, the explain output will show you the cached plan. One clue that it is a cached plan and has not been reprepared seems to be that the estimated cost is shown as zero.

As a result of these tests, I now feel better equipped to investigate and deal with query performance issues on instances where the instance cache is switched on globally or enabled at session level.


-710: Table <table> has been dropped, altered, or renamed.

During a recent meet-up with some IBM developers, we brought up the topic of error -710: Table <table> has been dropped, altered, or renamed. This error usually occurs when a prepared SQL statement has been invalidated by a schema change of some kind and can make code deployment or adding indices difficult in a live environment while applications are running.

IBM told us they have made significant progress with this problem. Firstly, in version 11.10, the onconfig parameter AUTO_REPREPARE was added which facilitated the automatic re-preparing of an already prepared SQL statement internally by the Informix engine if it had been invalidated by a schema change. Unfortunately this fix didn’t cover all scenarios, something IBM were open about at the time and there is an article in the IDS Experts blog on developerWorks covering the progress that was made then.

However, IBM now tell us they have made further progress on this problem to the point now where the -710 error should occur less frequently, if at all.

The developerWorks article describes six scenarios where a -710 error could occur in Informix 10.00 or earlier and the first three of these scenarios are fixed in Informix 11.10 when AUTO_REPREPARE is switched on. I thought it would be useful to create a test case for each of these and see if the problem has indeed been resolved.

The article gives more information but the six scenarios are:

  1. An index created on a table where a select statement has already been prepared and executed will cause that select statement to fail when it is executed again.
  2. A similar scenario where the select statement has not been previously used, a cursor is used and the select uses select *.
  3. A more complex example involving two tables, one with a trigger, and two procedures for which some code is supplied.
  4. Changing the number and type of columns in a select statement.
  5. Executing a prepared DDL statement, presumably where other DDL has been run on the table.
  6. An unspecified race condition.

All of this, apart from the last item, the race condition, is fairly easy to script up and test, although one slight caveat is that I prefer Perl-DBI and know little about languages like ESQL/C so I don’t have explicit control of cursors.

Thanks to a chum, I got hold of a copy of Informix 10.00.FC10 and ran through tests one to five:

sandbox_shm thompsonb@informix1:~ > ./710_test_cases.pl all

Running test 1: $prepare s1 from "select c1, c2 from t710";$execute s1;$create index i1 on t710(c1);$execute s1; -------> -710 error

Dropping existing table 't710':
done.
Creating table 't710':
done.
Inserting some data into table 't710':
done.
Selecting values:

c1 c2
10 10
20 20
30 30
56 56

Creating index:
done.
Re-using prepared statement to select values:

c1 c2
DBD::Informix::st execute failed: SQL: -710: Table (thompsonb.t710) has been dropped, altered or renamed. at ./710_test_cases.pl line 80.
Closing statement handle:
done.

Running test 2: $prepare s1 from "select * from t710 where c1 = 10";$declare curs1 cursor for s1;$create index i1 on t710(c1);$open curs1; -------> -710 error

Dropping existing table 't710':
done.
Creating table 't710':
done.
Inserting some data into table 't710':
done.
Preparing select statement:
done.
Creating index:
done.
Using prepared statement to select values:

c1 c2
DBD::Informix::st execute failed: SQL: -710: Table (thompsonb.t710) has been dropped, altered or renamed. at ./710_test_cases.pl line 113.
Closing statement handle:
done.

Running test 3: create procedure p1(c_a int, c_b int) returning integer;insert into A values(1001, 1001);update A set b=c_b where A.a=c_a; ==>(You have an update trigger defined on A which inserts into table B return 0;end procedure; create procedure p2() returning integer;define i integer;let i=p1(56, 56);create index i1 on B(b);return p1(56, 56); -- > -710 error when p1 is executedend procedure;

Dropping existing table 'a':
done.
Creating table 'a':
done.
Inserting some data into table 'a':
done.
Dropping existing table 'b':
done.
Creating table 'b':
done.
Inserting some data into table 'b':
done.
Dropping existing procedure 'p1':
done.
Creating procedure 'p1':
done.
Dropping existing procedure 'p2':
done.
Creating procedure 'p2':
done.
Creating update trigger on table 'a' which inserts into table 'b':
done.
Preparing statement to execute procedure 'p2':
done.
Executing procedure 'p2':
DBD::Informix::st fetch failed: SQL: -710: Table (thompsonb.b) has been dropped, altered or renamed. at ./710_test_cases.pl line 167.
done.
Closing statement handle:
done.

Running test 4: The number and type of columns in your SELECT list have changed.

Dropping existing table 't710':
done.
Creating table 't710':
done.
Inserting some data into table 't710':
done.
Preparing select statement from table 't710':
done.
Adding extra column to table 't710':
done.
Selecting from table 't710' using prepared statement:

c1 c2
DBD::Informix::st execute failed: SQL: -710: Table (thompsonb.t710) has been dropped, altered or renamed. at ./710_test_cases.pl line 195.
done.
Closing statement handle:
done.

Running test 5: If you are executing a prepared DDL statement, you might get -710 errors.

Dropping existing table 't710':
done.
Creating table 't710':
done.
Inserting some data into table 't710':
done.
Prepare a statement to add column 'c3' to table 't710':
done.
Add a column 'c4' to table t710:
done.
Execute the prepared statement to add column 'c3' to table 't710':
DBD::Informix::st execute failed: SQL: -710: Table (thompsonb.t710) has been dropped, altered or renamed. at ./710_test_cases.pl line 227.
done.
Closing statement handle:
done.

Exiting cleanly.

Great (in a way): all the test cases give a -710 error in Informix 10.00! I’ve put my other results in a table; a fail indicates that I got the -710 error:

Test Informix 10.00.FC10 Informix 11.50.FC9W2 without AUTO_REPREPARE Informix 11.50.FC9W2 with AUTO_REPREPARE Informix 11.70.FC5W1 without AUTO_REPREPARE Informix 11.70.FC5W1 with AUTO_REPREPARE
1 Fail Fail Pass Fail Pass
2 Fail Fail Pass Fail Pass
3 Fail Fail Pass Fail Pass
4 Fail Fail Pass Fail Pass
5 Fail Fail Fail Fail Fail

Unfortunately I do not have Informix 11.10 to test with and it is no longer available for download. It would be interesting to know whether it passes test 4 or not with AUTO_REPREPARE set.

Is the failure of test 5 by the latest releases important? I would say not. Preparing a DDL statement, then preparing and executing another on the same table before executing the first prepared statement is certainly not something I’ve seen done before in a real environment.

If anyone would like to have a go at reproducing my results or has any comments on my implementation of the test cases, I have posted the code here.


Huge pages

When upgrading to version 11.70, it’s good to review the list of new features you can take advantage of. IBM has helpfully produced a technical white paper for this purpose. One of these features is huge (or large) pages on Linux, of benefit if your system has a large amount of memory allocated to Informix. The primary advantage is a reduction in the size of the (separate) pages tables used by processes and the system to map to physical memory.

In Linux huge pages were first supported by the 2.6 kernel and this feature was later back-ported to 2.4.21, although the implementation is not the same. This blog post mostly concerns itself with how huge pages work in x86_64 Linux 2.6 kernels although I’ll try and point out where any differences may lie in other implementations.

You can have a look at how much space is allocated to page tables in your system by looking in /proc/meminfo at parameter PageTables. For example I have an 40 Gb Informix instance running on a server with 128 Gb of memory and page table entries totalling 1004 Mb (nearly 1 Gb) are needed to support the system.

Standard pages are 4 kb and point to a block of physical memory. In fact each page has a separate entry in a process page table, which then maps to separate system page table which in turn maps to physical memory. These page tables can contain a mix of standard and huge pages. By using huge pages, the block size increases to 2 Mb on x64_64 (16 Mb on Power PC, 256 Mb on Itanium and 1 Mb on System z). My 40 Gb instance would need 10,485,760 page table entries to support it using standard pages but just 20,480 entries using huge pages. A page table entry can be up to 64 bytes.

In fact the gains are even better than this because modern CPUs use a Translation Lookaside Buffer (TLB) to cache the page tables and these are of a fixed size, typically able to hold a few thousand entries. There is a good Wiki article that explains this in more detail. Page tables containing lots of standard pages therefore lead to more TLB misses where the operating system has to fetch other parts of the page table from system memory.

Huge pages are not used system-wide. Your system administrator must allocate an area of memory to huge pages as follows:

sysctl -w vm.nr_hugepages=<no. of huge pages>

Note that <no. of huge pages> is the number of pages and not in Mb. On 2.4 kernels the parameter is vm.hugetlb_pool.

It may be necessary to reboot your server so that Linux can allocate the memory.

For an Informix instance or set of instances running on a server, a sensible size would be the total memory footprint of all the instances. This can be easily obtained by running onstat –. You might also want to allocate some space for dynamic memory allocations too, although these can use standard pages if no huge page are available.

On Linux (and Solaris) Informix will automatically use huge pages if enough huge pages have been allocated and where the RESIDENT flag is set in onconfig to -1 or to a high enough value to cover your segments. You can control this with the IFX_LARGE_PAGES environment variable.

It’s important to understand that huge pages can only be used by processes that support them and cannot be swapped out so you need to leave normal pages for the operating system and any other processes.

On start-up, the server will put a message in the online log to show that huge pages are being used:

10:00:00 IBM Informix Dynamic Server Started.
10:00:00 Shared memory segment will use huge pages.
10:00:00 Segment locked: addr=0x44000000, size=39520829440
10:00:00 Shared memory segment will use huge pages.
10:00:00 Segment locked: addr=0x977a00000, size=42949672960

You might expect that onstat -g seg would then show you that huge pages are in use for a given segment but this is not the case.

What happens if the server needs to allocate an extra virtual segment? As usual the value of SHMADD will determine the size of the segment and Informix will check to see if there are sufficient huge pages available for it. If not, it will use normal pages and a message like the below will appear in the online log:

10:30:00 Warning: Server is unable to lock huge pages in memory.
Switching to normal pages.
10:30:00 Dynamically allocated new virtual shared memory segment (size 1048576KB)
10:30:00 Memory sizes:resident:38594560 KB, virtual:43013440 KB, no SHMTOTAL limit
10:30:00 Segment locked: addr=0x1378f50000, size=1073741824

You can monitor huge page use, again using /proc/meminfo:

> cat /proc/meminfo | grep HugePages
HugePages_Total: 40960
HugePages_Free:   2518
HugePages_Rsvd:    883

Comparing this with the output from onstat -g seg I have:

Segment Summary:
id         key        addr             size             ovhd     class blkused  blkfree 
38404103   52bb4801   44000000         39520829440      463568504 R*    9648205  435     
38436872   52bb4802   977a00000        42949672960      503318520 V*    8793768  1691992 
Total:     -          -                82470502400      -        -     18441973 1692427

No obvious relationship? I know from the online log that both the virtual and resident segments are using huge pages. If we take the total huge pages, subtract the free and add the reserved, we get (40960 - 2518 + 883) = 39325 pages. If we convert that into bytes: (39325 * 2048 * 1024), we get 82470502400 which is the total size of the two segments.


RSS, delayed apply and log staging directory

After a couple of RSS troubles this week, I thought I’d do a quick post to cover a few points which are not covered in the IBM documentation.

Using RSS with delayed apply brings into play the log staging directory, controlled by the onconfig parameter LOG_STAGING_DIR. When RSS is running normally you will only see enough logs in here to support the apply delay you’ve set. IBM don’t appear to make a recommendation on how much space to allow for this directory and you might consider it reasonable to work out how many logs you’ll get through during your busiest time and add on a safety margin. My recommendation is that it needs to be large enough to accommodate a full set of online logical logs and furthermore it should reside on a separate file system to ensure this space is always available.

The process of pairing up an RSS secondary with the primary server in your cluster takes place only after you’ve backed up your primary and restored it on the RSS secondary so you might have gone through quite a few logical logs in this time. You may also end up with a large gap if your network link has failed, for example. The use of the staging directly decouples the shipping of the logs from the primary to the RSS server and their application. So once the primary starts shipping logs to the RSS server, because of the log staging directory, it can send them as fast as your network link will allow and just write them to the staging directory unhindered by the apply process. This could be much faster than the apply process and the staging directory could rapidly fill up. If you run out of space in the log staging directory this will be reported on the RSS server:

ERROR:log staging aborted due to IO error (errno:28)
No space left on device

To fix this I just let the RSS server run and apply and delete all the logs in the staging directory, deleting them as it goes, after which it will stop applying. Then I restarted simply it.

A little gremlin I have found is that you may hit a small problem if you use RSS and have implemented role-separation. Role-separation allows you to change the group of the $INFORMIXDIR/etc directory to a group of which your DBAs are a member.

The manual states:

After the installation is complete, INF_ROLE_SEP has no effect. You can establish role separation manually by changing the group that owns the aaodir, dbssodir, or etc directories. You can disable role separation by resetting the group that owns these directories to informix. You can have role separation enabled for the AAO without having role separation enabled for the DBSSO.

Role separation control is through the following group memberships:

  • Users who can perform the DBSA role are group members of the group that owns the directory $INFORMIXDIR/etc.
  • Users who can perform the DBSSO role are group members of the group that owns the $INFORMIXDIR/dbssodir directory.
  • Users who can perform the AAO role are group members of the group that owns the $INFORMIXDIR/aaodir directory.

Note: For each of the groups, the default group is the group informix.

And the RSS documentation states:

The directory specified by the LOG_STAGING_DIR configuration parameter must be secure. The directory must be owned by user informix, must belong to group informix, and must not have public read, write, or execute permission.

So when you set up RSS for the first time you won’t be surprised to find that a subfolder is created under the folder specified as LOG_STAGING_DIR in your onconfig file with group informix.

All well and good but when you restart the server you then see the message like:

Secondary Delay or Stop Apply: The log staging directory () is not secure.

The directory will be as specified in the manual so this message will be unexpected. Manually altering the group on the log staging directory to the group that owns $INFORMIXDIR/etc and restarting the RSS server fixes the problem.