Prepared statements and the SQL statement cache

Recently I was asked whether I use the SQL statement cache. The answer was no but the more interesting question was why not. When I thought about the reasons why not most of them were out of date or boiled down to a lack of understanding of the finer details on my part.

Thinking about it for a little longer, Informix has always been efficient at parsing or optimising SQL statements and this is an area where it seems to scale without difficulty so I have never had great cause to turn it on. However, Informix must also be using extra CPU cycles for parsing and optimising when a good plan could be ready to go in the cache.

A performance test could be the subject of a future blog post but I want to look at controlling the cache. For example, what happens if a “bad plan” gets cached? How would I get it out of there or force a re-parse and re-optimisation?

As ever, I will be using my trusty sandbox to investigate, currently running Informix 11.70.FC5W1 with STMT_CACHE set to 1 (enabled at session level).

For my tests I set up a log table, the sort of thing that could be used to record user log-ons, with two columns: a varchar for the user name and a date/time column for the log-on time. Both these columns are separately indexed. I will query the table using both columns in the where clause, giving the optimiser the choice of using either index (or a full-scan, in practice not used) to read from the table. The log-on time index is useful for queries intended to show the latest log-ons, independent of the user, but Informix may also be tempted to use it if the expected number of values returned is low and it is deemed to be more selective than the index on the user name. For the queries I am going to run, the user name index will generally be the most efficient but might not always be part of the plan chosen.

I’ll execute the same prepared statement twice with two sets of bind variables. In my tests I want to find out:

  • In the case where the statement cache is off or the statement is not in the cache, what determines the initial query plan?
  • Once the query is cached, what can I do, short of flushing the entire cache, to force a re-parse?

The schema for my test is:

create schema authorization informix
create table logons (
  logon_timestamp datetime year to second,
  username varchar(32)
) extent size 460000 next size 20000 lock mode row;

create index informix.ix_username on informix.logons (username) using btree;
create index informix.ix_logon_timestamp on informix.logons (logon_timestamp) using btree;

I have around 8 million unique rows in this table and high level statistics at resolution 0.5 on both columns.

The Perl code snippet I am going to test with is:

$$dbh->do("SET EXPLAIN ON");
$$dbh->do("SET STATEMENT CACHE OFF"); # Change to ON, as required

my $sql = "SELECT * FROM logons WHERE logon_timestamp BETWEEN ? AND ? AND username=?";
my $sth = $$dbh->prepare($sql);

# Execute prepared statement without any bind variables (forces optimisation). Comment out to test optimisation with bind values.
$sth->execute();

# Execute prepared statement with given bind variables: change as required
$sth->execute('2010-01-11 12:50:50', '2013-01-11 12:50:50', 'BTHOMPSON');
my $count = 0;
while (my @columns = $sth->fetchrow_array) {
    print "$columns[0]\n";
    $count++;
    last if $count == 10;
}

# Execute prepared statement with given bind variables: change as required
$sth->execute('2013-01-11 12:50:50', '2013-01-11 12:50:50', 'BTHOMPSON');
$count = 0;
while (my @columns = $sth->fetchrow_array) {
    print "$columns[0]\n";
    $count++;
    last if $count == 10;
}

$sth->finish();

The key thing about the code is that the statement uses bind variables and is prepared only once.

Some initial results are (with the statement cache off):

Lower timestamp bind value Upper timestamp bind value User name bind value Index used
2013-01-11 12:50:50 2013-01-11 12:50:50 BTHOMPSON ix_logon_timestamp
2010-01-11 12:50:50 2013-01-11 12:50:50 BTHOMPSON ix_username
<None> <None> BTHOMPSON ix_logon_timestamp

So far, so fiddled. But one interesting result has already dropped out. If I prepare and optimise the statement without any bind values I fix the plan. I achieve this in Perl by using the $sth->execute function but I don’t provide any bind values and don’t fetch any rows. When I re-execute the statement with bind variables I find the statement has already been optimised and subsequent executions will use the same plan. I had expected that I would have to supply some real bind variables but this appears to be the case even with no bind variables initially supplied. I am not sure what this means in practice, since you probably not do this in your code, but it is an interesting result nonetheless. It is certainly not the same as binding blank values or nulls and Informix will generate an explain plan for the query the first time $sth->execute is called.

Let’s switch on the statement cache (SET STATEMENT CACHE ON) and see if there are any differences. Well, there are not the first time the script is run but subsequent runs with different settings will re-use the initial plan. We need to flush the cache with onmode -e flush each time to force the plan to be reparsed.

We can see that there is now a statement cache entry with onstat -g ssc:

> onstat -g ssc

IBM Informix Dynamic Server Version 11.70.FC5W1 -- On-Line -- Up 6 days 02:45:27 -- 1224528 Kbytes

Statement Cache Summary:
#lrus   currsize             maxsize              Poolsize             #hits   nolimit 
4       24440                524288               40960                0       0       

Statement Cache Entries: 

lru hash ref_cnt hits flag heap_ptr      database           user
--------------------------------------------------------------------------------
  0   51       0    0   -F 83df8038      optimiser_test     thompsonb
  SELECT * FROM logons WHERE logon_timestamp BETWEEN ? AND ? AND username=?



    Total number of entries: 1.

As I mentioned before, in this example using the plan using the index on logon_timestamp is generally a poor choice for anything other than the smallest time ranges. As a common plan for this query, the index on user name would be the best. So what am I to do when the statement cache is on and the first used bind values caused the optimiser to settle on the index on logon_timestamp?

onmode -e flush is going to work but it’s a bit of a sledge hammer and might mean I have to watch for other queries being re-optimised badly. Another alternative is to perform some DDL on (one of) the table(s) in the query. This still affects all queries using that table but is more targetted than flushing the cache. A trick I have learned from Oracle, where this is a common problem as all statements are cached in the shared pool, and which also works with Informix is to perform a grant or revoke as the DDL statement, e.g.:

grant select on informix.logons to thompsonb as informix;

If done carefully, you can grant a privilege that is not needed and then revoke it again immediately afterwards.

One good thing is that when you prepare and then execute the query for which there is already a plan in the statement cache, the explain output will show you the cached plan. One clue that it is a cached plan and has not been reprepared seems to be that the estimated cost is shown as zero.

As a result of these tests, I now feel better equipped to investigate and deal with query performance issues on instances where the instance cache is switched on globally or enabled at session level.