Improving remote query performance by tuning FET_BUF_SIZE

I thought I’d write blog post as a nice example of where tuning the client-side variable, FET_BUF_SIZE, really speeded up a remote query.

FET_BUF_SIZE is documented by IBM in the context of a Java application using JDBC here and as a server environment variable here.

One thing the documentation warns about is that simply setting this to a high value may degrade performance, especially if you have a lot of connections. With that in mind here are some facts about the query I’m running and using as a basis for these tests:

  • I am just using a single connection to the database.
  • the query returns around 10000 rows and 60 Mb of data.
  • the client and the server are geographically separated from each other and Art Kagel’s dbping utility typically takes around 0.1 seconds to connect remotely; this compares with around 3 milliseconds locally.
  • crucially the query runs in seconds locally on the server but takes over three minutes when run remotely.

If I begin running the query with the default value of FET_BUF_SIZE and monitor waits on the server, I can see that reads only go up slowly and that my session is waiting on a condition (indicated by the Y in position one of column two) more or less all the time:

> while [ 1 ] ; do
> onstat -u | grep thompson
> sleep 1
> done
Userthreads
address flags sessid user tty wait tout locks nreads nwrites
26eb492d18 Y--P-R- 76228 thompson 0 26e67cd298 0 0 552 0
26eb492d18 Y--P-R- 76228 thompson 0 26e67cd298 0 0 552 0
26eb492d18 Y--P-R- 76228 thompson 0 26e67cd298 0 0 560 0
26eb492d18 Y--P-R- 76228 thompson 0 26e67cd298 0 0 560 0
26eb492d18 Y--P-R- 76228 thompson 0 26e67cd298 0 0 568 0
26eb492d18 Y--P-R- 76228 thompson 0 26e67cd298 0 0 576 0
26eb492d18 Y--P-R- 76228 thompson 0 26e67cd298 0 0 592 0
26eb492d18 Y--P-R- 76228 thompson 0 26e67cd298 0 0 624 0
26eb492d18 Y--P-R- 76228 thompson 0 26e67cd298 0 0 624 0

The sixth column shows the rstcb value of the thread I’m waiting on. I can use onstat -g con (print conditions with waiters) to see that I’m waiting on the network:

> onstat -g con | grep -E '^cid|26e67cd298'
cid addr name waiter waittime
5789 26e67cd298 netnorm 84353 0

A quick check with onstat -g ses 76228 shows that thread id. 84353 does indeed correspond to my session.

While the wait time shown above is not increasing it’s a different story when we look at netstat, again on the server:

> netstat -nc | grep '172.16.0.1'
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 1312 10.0.0.1:9088 172.16.0.1:37004 ESTABLISHED
tcp 0 1284 10.0.0.1:9088 172.16.0.1:37004 ESTABLISHED
tcp 0 1306 10.0.0.1:9088 172.16.0.1:37004 ESTABLISHED
tcp 0 1302 10.0.0.1:9088 172.16.0.1:37004 ESTABLISHED
tcp 0 1194 10.0.0.1:9088 172.16.0.1:37004 ESTABLISHED
tcp 0 1206 10.0.0.1:9088 172.16.0.1:37004 ESTABLISHED
tcp 0 1266 10.0.0.1:9088 172.16.0.1:37004 ESTABLISHED
tcp 0 1304 10.0.0.1:9088 172.16.0.1:37004 ESTABLISHED
tcp 0 1318 10.0.0.1:9088 172.16.0.1:37004 ESTABLISHED
tcp 0 1248 10.0.0.1:9088 172.16.0.1:37004 ESTABLISHED

What the above is showing us is that there are consistently around 1200 to 1300 bytes in the send queue (Send-Q). This is surely our bottleneck.

At this point when investigating the problem I considered modifying other parameters such as OPTOFC and Linux kernel parameters. However with a few moment’s thought it was clear these weren’t going to gain anything: OPTOFC optimises the open-fetch-close sequence and for a single long running query this is not going to give us anything measurable; and an investigation into increasing the Linux kernel parameter related to the send queue size was dismissed when we found that 1300 bytes was well below the maximum allowed.

In Informix 11.50 the maximum value of FET_BUF_SIZE is 32767 (32 kb) but this is increased to 2147483648, or as we’ll see actually 2147483647, (2 Gb) in 11.70 and above. We can therefore move onto to experiment with different values:

FET_BUF_SIZE Query run time (s) Average Send-Q size over 10 samples Maximum Send-Q size observed
Default 221.2 1274 1332
1024 221.1 1255 1326
2048 221.1 1285 1338
4096 221.2 1297 1360
6144 102.1 2564 2676
8192 56.6 5031 5210
16384 22.6 12490 13054
32767 (max. 11.50 value) 11.5 24665 29968
65536 7.0 62188 62612
131072 4.9 115793 127826
262144 4.0 146686 237568
524288 3.5 184320 249856
1048576 3.3 245760 473616
2097152 3.2 249856 486352
2147483647 (max. value – 1) 3.0 245760 549352
2147483648 (supposed max. value) 221.3 1276 1366

As the run times get shorter it gets tricky to measure the Send-Q using netstat -nc: it can be sampled very frequently using a command like:

while [ 1 ] ; do
netstat -n | grep '172.16.0.1'
done

This will produce many measurements per second and with this it’s possible to see it fill up and drain several times in the period while the statement is running.

It’s also interesting to play around with the boundaries. For example, with a FET_BUF_SIZE between around 5500 and 5600 maximum Send-Q sizes the same as those consistently achieved with a FET_BUF_SIZE of 6144 begin to creep into the results but many measurements remain around the values consistently measured wit a FET_BUF_SIZE of 4096:

Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 1316 10.0.0.1:9088 172.16.0.1:37488 ESTABLISHED
tcp 0 1318 10.0.0.1:9088 172.16.0.1:37488 ESTABLISHED
tcp 0 1278 10.0.0.1:9088 172.16.0.1:37488 ESTABLISHED
tcp 0 1352 10.0.0.1:9088 172.16.0.1:37488 ESTABLISHED
tcp 0 1288 10.0.0.1:9088 172.16.0.1:37488 ESTABLISHED
tcp 0 2546 10.0.0.1:9088 172.16.0.1:37488 ESTABLISHED
tcp 0 1278 10.0.0.1:9088 172.16.0.1:37488 ESTABLISHED
tcp 0 2502 10.0.0.1:9088 172.16.0.1:37488 ESTABLISHED
tcp 0 1266 10.0.0.1:9088 172.16.0.1:37488 ESTABLISHED
tcp 0 1314 10.0.0.1:9088 172.16.0.1:37488 ESTABLISHED
tcp 0 2506 10.0.0.1:9088 172.16.0.1:37488 ESTABLISHED
tcp 0 1292 10.0.0.1:9088 172.16.0.1:37488 ESTABLISHED

So what are the conclusions?

  • Increasing FET_BUF_SIZE at the client side can dramatically improve the speed of remote queries.
  • Maximum Send-Q sizes, as measured by netstat, increase in discrete steps as FET_BUF_SIZE is increased.
  • A larger Send-Q allows more data to be cached and reduces waits seen in Informix.
  • To see any improvement at all FET_BUF_SIZE must be increased to at least 6000 (approximate value).
  • Around boundaries between maximum Send-Q sizes there appears to be a cross-over region where maximum send queue sizes overlap from two adjacent values are seen from one second to the next.
  • The maximum value allowed in 11.70 at least is 2147483647 and not 2147483648, as indicated in the documentation.
  • The maximum 11.50 value of 32767 produced a run time nearly 4x slower than an optimised value for 11.70+
  • Other testing I did, not documented here, shows that the results are uniform across JDBC and ESQL/C applications.

Note: all user names, IP addresses and port numbers used in this post have been altered.


Informix or Client SDK install: No Java virtual machine could be found

This is a something of a note to self. For some time it has been been the case that you may see this message when attempting an Informix server or Client SDK install if there is a problem starting the installer’s Java runtime environment:

# LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib64:/lib64 ./ids_install
Preparing to install...
Extracting the JRE from the installer archive...
Unpacking the JRE...
Extracting the installation resources from the installer archive...
Configuring the installer for this system's environment...
No Java virtual machine could be found from your PATH
environment variable. You must install a VM prior to
running this program.

To add insult to injury when this condition occurs the installer exits with status code zero, suggesting all is ok.

Now the obvious thing to do seems to be to install a Java package, wondering whether OpenJDK will suffice or the official Oracle version is needed. This is never the answer! The Informix installer comes bundled with its own Java run time environment (JRE) which gets extracted into /tmp/install.dir.X and your challenge is in fact to find out why it isn’t working as it should.

You can see in my attempt at installing the product I have already prefaced the command with LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib64:/lib64. This is already a known way of fixing some installation problems. (For a 32-bit version you’d simply use /usr/lib:/lib.)

Everyone’s friend, strace, is a great way to start investigating this problem. In amongst the output I find this:

faccessat(AT_FDCWD, "/tmp/install.dir.12813/Linux/resource/jre/jre/bin/java", X_OK) = -1 EACCES (Permission denied)

So why is this? I am logged in as root so I ought not be running into permission denied issues.

The core problem here is the way /tmp, which is a separate filesystem on my machine, is mounted. From the mount command output:

tmpfs on /tmp type tmpfs (rw,nosuid,nodev,noexec,relatime)

The key part here is the noexec flag which is a security feature preventing execution of binary files residing on this filesystem.

The best way to fix this is to set the environment variable IATEMPDIR to a directory on a filesystem where execution is allowed. I usually use /root for this purpose. And success:

# export IATEMPDIR=/root
# LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib64:/lib64 ./ids_install
Preparing to install...
Extracting the JRE from the installer archive...
Unpacking the JRE...
Extracting the installation resources from the installer archive...
Configuring the installer for this system's environment...

Launching installer...

Preparing CONSOLE Mode Installation...

While the above should be sufficient I have seen the server installer still fail to work even with this environment variable set as some files may still be placed in /tmp. In this situation you can temporarily remove the security restriction with:

mount -o remount,rw,nosuid,nodev,relatime,exec /tmp

and switch it back on again with:

mount -o remount,rw,nosuid,nodev,relatime,noexec /tmp

I suggest before running the above you check the existing mount options for your /tmp filesystem.


Intermittent “CSM: authentication error” with JDBC

This article will only concern you if:

  • you connect using JDBC.
  • you use simple password encryption.

Simple password encryption just does one thing: it encrypts the password sent to the database server in transit preventing it from being obtainable by network packet sniffing.

At the client end configuring simple password encryption can be done simply by adding
";SECURITY=PASSWORD"
to your JDBC connection string.

At the server end set up a DBSERVERALIAS and add
csm=(SPWDCSM)
to the fifth field in sqlhosts and set environment variable INFORMIXCONCSMCFG before starting the instance to point to a file containing something like:
SPWDCSM("/opt/informix/lib/csm/libixspw.so", "", "p=1")

This is covered in more detail elsewhere and I haven’t covered using CSDK but for JDBC connections it’s all there is to know.

Unfortunately there is a bug in JDBC 3.70.JC8W1 and JDBC 4.10.JC7 and below where every 100th connection attempt or so will fail randomly with this stack:

java.sql.SQLException: CSM: authentication error.
at com.informix.jdbc.IfxSqliConnect.(IfxSqliConnect.java:1337)
at sun.reflect.GeneratedConstructorAccessor3.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at com.informix.jdbc.IfxDriver.connect(IfxDriver.java:243)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:270)
at Connect.main(Connect.java:30)
Caused by: com.informix.asf.IfxASFRemoteException:
at com.informix.asf.Connection.recvBindResponse(Connection.java:1363)
at com.informix.asf.Connection.establishConnection(Connection.java:1619)
at com.informix.asf.Connection.(Connection.java:392)
at com.informix.jdbc.IfxSqliConnect.(IfxSqliConnect.java:1232)
... 7 more

You can see if you’re vulnerable by compiling this app and running it until it fails or you’re reasonably confident you don’t have a problem:

public class Connect
{
public static void main(String[] args)
{
Connection conn = null;
String url = "jdbc:informix-sqli://hostname:port/dbname:INFORMIXSERVER=informixserver;user=user;password=password;SECURITY=PASSWORD";
System.out.println(url);

try
{
Class.forName("com.informix.jdbc.IfxDriver");
}
catch (Exception e)
{
System.out.println("FAILED to load Informix JDBC driver.");
e.printStackTrace();
return;
}

int i=0;
while (true) {
i++;
try
{
conn = DriverManager.getConnection(url);
}
catch (SQLException e)
{
System.out.println("FAILED to connect! "+e);
e.printStackTrace();
}
System.out.println("Connected " + i);
if (conn != null) {
try {
conn.close();
}
catch (SQLException e) {
System.out.println("FAILED to disconnect! "+e);
e.printStackTrace();
}
}
}
}
}

If your application handles fails connections and retries automatically you might not have noticed this error or perhaps it was lost in the noise but for more simple applications it can be a pain.

Fortunately this is fixed in JDBC 4.10.JC8 and the fix works with 11.70 and 12.10 versions of the servers.

Interestingly the JDBC release notes for 4.10.JC8 are coy about this, showing just one fix.


Buffer waits

Is it really a year since I last wrote a blog post? It does mean I have had time to think of a few topics to write about.

Recently a desk visitor came to me at work about a performance issue. From what he said I got the impression that he thought that DBAs spend most of their time tuning SQL queries, something I spent little time on during a typical day, perhaps one of the advantages of working on a mature system.

Perhaps though he was kind of right. Many of the things I do are around making sure queries run reliably, consistently and in a scalable manner, I just don’t necessarily do this by looking at explain plans. Don’t get me wrong: these are very important; it’s just that once these are as good as they can be it doesn’t mean there aren’t other ways of finding bottlenecks or contention and tuning them out. For example when I was a more junior DBA I used to concern myself with buffer cache hits and, once I knew a little more, buffer turnover ratios, reducing I/O by allocating more memory as Moore’s Law provided rapid improvements in server CPU speed and memory size.

In the Oracle world DBAs have moved away from this measure and use the Oracle wait interface which allows you to see what operations the engine was doing. We can do this (slightly differently) in Informix too and it’s very useful. The simplest overview of waits is to look at position one of the flags column from “onstat -u”. The Informix documentation states:

flags
Provides the status of the session.
The flag codes for position 1:
B Waiting for a buffer
C Waiting for a checkpoint
G Waiting for a write of the logical-log buffer
L Waiting for a lock
S Waiting for mutex
T Waiting for a transaction
Y Waiting for condition
X Waiting for a transaction cleanup (rollback)

Most sessions will probably show Y which usually means they are waiting on TCP transit or are idle, waiting for the application or user’s session to do something.

As the manual says we can get more detail for buffers through onstat -b, -B and -X; for latches (mutexes) through onstat -s, (also -g lmx and -g wmx) and locks with onstat -k.

onstat -X is the most useful way to examine buffer waits but it is not as user-friendly as it could be, which is a shame because it often offers many clues to where problems lie:

Buffers (Access)
address owner flags pagenum memaddr nslots pgflgs scount waiter

Buffer pool page size: 2048
14700 modified, 16777216 total, 16777216 hash buckets, 2048 buffer size

Buffer pool page size: 4096
2443cd7e8 ffffffffffffffff 80 25:1523486 cce17b000 101 2801 0 0
267ca4628 0 0 47:1570054 105c3c5000 122 2890 1 0
53980 modified, 8388608 total, 8388608 hash buckets, 4096 buffer size

Buffer pool page size: 8192
59577 modified, 8388608 total, 8388608 hash buckets, 8192 buffer size

Buffer pool page size: 16384
3784a8188 ffffffffffffffff 80 162:18904760 4baadf4000 248 890 0 0
37854d188 ffffffffffffffff 80 162:24581408 4baeff4000 248 890 0 0
378ead5e8 ffffffffffffffff 80 124:25597240 4beb010000 187 2801 0 0
378f781a8 ffffffffffffffff 80 124:25597376 4bf0128000 710 890 0 0
3798d3388 ffffffffffffffff 80 124:25597176 4c2bf34000 710 890 0 595236d428
3799321a8 ffffffffffffffff 80 162:13196672 4c2e528000 248 890 0 624d39d668
37a353128 ffffffffffffffff 80 124:25597840 4c6f258000 197 801 0 0
37a4cefe8 ffffffffffffffff 80 168:32233760 4c78a50000 399 890 0 0
37c485d28 ffffffffffffffff 80 264:13942672 4d439d8000 319 890 0 0
37c5b45c8 ffffffffffffffff 80 162:24158848 4d4b2dc000 193 2801 0 0
37c80f368 ffffffffffffffff 80 168:33303832 4d5a400000 303 890 0 0
37caf6ce8 0 10 124:25597160 4d6cd70000 710 890 1 0
37ceaab28 ffffffffffffffff 80 166:8227296 4d84898000 332 890 0 0
37ceba8e8 ffffffffffffffff 80 124:25597648 4d84ef0000 710 890 0 0
37d70f4a8 ffffffffffffffff 80 124:25597208 4dba408000 193 801 0 0
37d891088 ffffffffffffffff 80 162:26376432 4dc3e54000 248 890 0 0
37dc9abe8 58cc3c7eb8 80 144:18435488 4dddbd0000 193 2801 0 0
87962 modified, 13762560 total, 16777216 hash buckets, 16384 buffer size

The key problem here from a usability point of view is that it is tedious to convert the chunk:pagenum format into an actual database object.

There is a similar problem with part numbers when deciphering output from, for example onstat -k that can be solved by downloading and compiling the ESQL/C utility partn from the IIUG software repository.

Loosely based on that here is my Perl script, chunkpg, which can provide friendly names for objects in chunk:pagenum format:

#!/usr/bin/perl

# Decipher chunk and page numbers

use strict;
use warnings;

&main;

sub main {

my $key = &check_params;

my $chunkno = 0;
my $nextinfo = 0;
my $pagesize = 0;
my $count = 0;
my $lastdbs = '';
my $syspagesize = 2; # change to 4 kb if required

my (%objs, %start, %end, %c, %ps, %dbs);

open (ONCHECKPE, "oncheck -pe |");
while () {
chomp;
if ($nextinfo == 1) {
$nextinfo = 0;
my ($blank, $path, $size, $used, $free);
($blank, $chunkno, $path, $pagesize, $size, $used, $free) = split / +/;
# print "$chunkno: $pagesize kb\n";
$count = 0;
$c{$chunkno} = 0;
$ps{$chunkno} = $pagesize;
}
elsif ($_ eq ' Chunk Pathname Pagesize(k) Size(p) Used(p) Free(p)') {
$nextinfo = 1;
}
elsif ($_ =~ /^DBspace Usage Report: /) {
my @parts = split / +/;
$lastdbs = $parts[3];
$dbs{$chunkno} = $lastdbs;
}
elsif ($_ =~ /^ \w+:\'\w+\'\.\w+ +/) {
$count++;
my ($blank, $obj, $offset, $size) = split / +/;
# printf ("%s: %d -> %d\n", $obj, $offset*2, $offset*2+$size*2);
$objs{$chunkno}{$count} = $obj;
$start{$chunkno}{$count} = $offset;
$end{$chunkno}{$count} = $size + $offset;
$c{$chunkno}++;
if (!$dbs{$chunkno}) {
$dbs{$chunkno} = $lastdbs;
}
}
}
close (ONCHECKPE);

while () {
chomp;
print "$_ ";
$_ =~ s/^ +//;
my @vals = split / +/;
if ($vals[$key-1] && $vals[$key-1] =~ /\d+\:\d+/) {
my ($chunk, $page) = split /\:/, $vals[$key-1];
for (my $i = 1; $i = $start{$chunk}{$i} && $end{$chunk}{$i} && $page/($ps{$chunk}/$syspagesize) <= $end{$chunk}{$i}) {
print $objs{$chunk}{$i}.','.$dbs{$chunk};
last;
}
}
}
print "\n";
}
}

sub check_params () {

die print "INFORMIXDIR is not set.\n" unless $ENV{'INFORMIXDIR'};
die print "INFORMIXDIR is not set to a valid directory (\'$ENV{'INFORMIXDIR'}\').\n" unless -d $ENV{'INFORMIXDIR'};
die print "INFORMIXSERVER is not set.\n" unless $ENV{'INFORMIXSERVER'};

if ($ARGV[0] && $ARGV[0] eq '-k') {
if ($ARGV[1] && $ARGV[1] !~ /\D/) {
return ($ARGV[1]);
}
else {
print "Invalid key number: $ARGV[1]\n";
exit 1;
}
}
elsif ($ARGV[0]) {
print "Invalid parameter: $ARGV[1]\n";
exit 1;
}
return (1);
}

You can then run as follows: onstat -X | chunkpg -k 4

If you were to run onstat -X repeatedly, perhaps at different times of day, you would begin to build a picture of where buffer waits are occurring.

Having identified buffer waits, what can be done to reduce them?

Ultimately it is going to come back to your database design, including its storage schema, the queries you run and maybe onconfig parameters.

Buffer waits on tables can be reduced using partitioning. Fragmentation by round-robin is effective for improving the rate of concurrent inserts (although it doesn’t facilitate fragment elimination in queries). Very large tables may require some form of partitioning to avoid reaching the 16.7m (2^24) page limit but even smaller tables with a large number of inserts and/or deletes can benefit.

It’s worth noting here that buffer waits occur in server RAM and so it’s not obvious at all that the storage schema should come into it. After it only directly affects what happens on disk, right? Not so: I can show that different storage schemas perform differently by running a concurrent insert test on a machine tuned to only flush data to disk at checkpoints. If a table has 10-way round-robin partitioning then it has ten different part numbers and is for many purposes ten different tables. For inserts we are always appending to the end of the table and so there will be contention on the last buffered page between sessions. Spreading this across ten different partitions reduces it.

Indices can also see a large number of buffer waits. In one stark example I found a large number of buffer waits were occurring on a large single-column index where every value in the column was null (nulls are indexed in Informix, unlike some other RDBMSs) and so there was a single leaf node pointing to all rows. Every insert and delete needed to modify this. A less extreme version of this might be seen where there are a limited number of values a field can take. In this case dropping the index or combining the index column with another to vastly increase the number of nodes would help throughput.

Would index fragmentation/partitioning help? It is probably less useful, in general because there is no 16.7m page limit for indices since version 11.70 and, in the case of fragmentation by expression, the same nodes could be equally congested.

In conclusion I think this method is a useful addition to your performance tuning armoury and by reducing contention you can improve the scalability of your system and increase throughput.


Zone reclaim mode

Non-uniform memory access or NUMA is not a new concept but high end multiprocessor Intel-based servers are increasingly configured with this architecture, bringing it more to the mainstream. Put simply NUMA means that instead of all processors accessing your main system memory through a common bus, each processor is allocated an even share of the memory that it can address directly. If a processor needs to access memory controlled by another processor it can do so through that other processor.

Linux kernels from v2.5 onwards are aware of any NUMA architecture and it can be displayed using numactl -H or numactl –hardware:

node distances:
node 0 1 2 3
0: 10 21 21 21
1: 21 10 21 21
2: 21 21 10 21
3: 21 21 21 10

The above is from a four socket server. It shows that fetching from local memory is weighted at ’10’ and from memory controlled by other processors ’21’. I strongly suspect these weightings are hard coded.

numactl -H also shows information about how the memory is split between processors. The term ‘node’ is used:

available: 4 nodes (0-3)
node 0 cpus: 0 1 2 3 4 5 6 7 32 33 34 35 36 37 38 39
node 0 size: 65418 MB
node 0 free: 310 MB
node 1 cpus: 8 9 10 11 12 13 14 15 40 41 42 43 44 45 46 47
node 1 size: 65536 MB
node 1 free: 41 MB
node 2 cpus: 16 17 18 19 20 21 22 23 48 49 50 51 52 53 54 55
node 2 size: 65536 MB
node 2 free: 82 MB
node 3 cpus: 24 25 26 27 28 29 30 31 56 57 58 59 60 61 62 63
node 3 size: 65536 MB
node 3 free: 43 MB

What the above shows is that the free memory available to each node varies. If a process running on node 3, in our example, needs to allocate memory and it needs more than 43 Mb, it can either:

  • Use memory assigned to another node, for example node 0. This means the memory access will not be local.
  • Reclaim memory from node 3’s local memory by evicting other pages from memory.

The kernel switch vm.zone_reclaim_mode controls which behaviour is used. If set to 1 it will prefer to evict other pages from memory.

This is explained in a great more detail in this article by Christoph Lameter.

How is this parameter set on your system? You can check by running cat /proc/sys/vm/zone_reclaim_mode

If it’s set to 1 on your Informix system you should definitely read on. You’ll be glad to hear this parameter can be changed dynamically.

In the latest kernels (2014 onwards) this commit means that the parameter will never be set on your system automatically but if you’re running an enterprise Linux you could be on a kernel version like 2.6.32 (RHEL 6) where this can occur: although patched the base version of this dates from 2009.

I am not sure of the exact criteria that determine when older Linux kernels will switch on this feature at boot up. I think you need a modern four (or more) processor server with a NUMA architecture but there may be other requirements.

It’s interesting to read the slightly repetitious kernel commit log:

When it was introduced, zone_reclaim_mode made sense as NUMA distances punished and workloads were generally partitioned to fit into a NUMA node. NUMA machines are now common but few of the workloads are NUMA-aware and it’s routine to see major performance degradation due to zone_reclaim_mode being enabled but relatively few can identify the problem.

Those that require zone_reclaim_mode are likely to be able to detect when it needs to be enabled and tune appropriately so lets have a sensible default for the bulk of users.

This patch (of 2):

zone_reclaim_mode causes processes to prefer reclaiming memory from local node instead of spilling over to other nodes. This made sense initially when NUMA machines were almost exclusively HPC and the workload was partitioned into nodes. The NUMA penalties were sufficiently high to justify reclaiming the memory. On current machines and workloads it is often the case that zone_reclaim_mode destroys performance but not all users know how to detect this. Favour the common case and disable it by default. Users that are sophisticated enough to know they need zone_reclaim_mode will detect it.

Hopefully now the relevance to Informix is becoming a little clearer. Certainly there has been much complaining in the PostgreSQL community about this parameter. Another frustrated blog post describes some of the massive I/O latency problems it can cause on your system even when under no obvious memory pressure.

On our Informix system, which uses huge pages, we have experienced long disruptive checkpoints as a result of zone reclaiming. As huge pages are not swappable, it’s likely to be our monitoring and other non-Informix processes provoking the zone reclaims.

The long checkpoint behaviour can be summarised as:

  • A checkpoint is triggered by CKPTINTVL.
  • Informix instructs all threads to finish what they are doing and goes into state CKPT REQ.
  • One or more threads may be in critical section and must continue to the end of this section before it can stop.
  • A zone reclaim is occurring and I/O throughput dramatically decreases and this thread takes many seconds to come out of critical section.
  • All active threads wait (state C in the first column of onstat -u).
  • Eventually the operation completes, the checkpoint actually occurs very quickly and processing continues.

This behaviour can occur in later versions of the engine with non-blocking checkpoints.

If you have the mon_checkpoint sysadmin task enabled (I strongly recommend this), information about your checkpoints will be written to sysadmin:mon_checkpoint. (Otherwise you only retain information about the last twenty checkpoints visible through onstat -g ckp.) A tell tale sign is a large crit_time, nearly all of the checkpoint duration, and a much smaller flush_time.

You can get further evidence of whether a zone reclaim might be occurring at the same time by looking at the number of pages scanned per second in the output from sar -B. (sar is a very sophisticated monitoring tool these days with views into many aspects of the operating system.)

One test you can try (on a test server) is LinkedIn Engineering’s GraphDB simulator. It’s a C++ program that mimics the behaviour of GraphDB and is designed to provoke zone reclaim behaviour from the Linux kernel if it is switched on.

On our test system we can leave it running for hours without zone reclaim enabled and monitor it through sar -B.

10:30:55 AM pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s pgscand/s pgsteal/s %vmeff
10:31:00 AM 951.42 20993.52 8415.59 0.81 1351.62 0.00 0.00 0.00 0.00
10:31:05 AM 294.97 20930.38 8764.59 2.21 3286.92 0.00 0.00 0.00 0.00
10:31:10 AM 170.28 24627.31 4939.16 1.61 1859.64 32276.31 16282.73 565.06 1.16
10:31:15 AM 193.12 77519.03 5379.96 1.42 53762.75 4495.55 0.00 93.72 2.08
10:31:20 AM 240.24 88966.60 6875.45 1.81 1483.30 0.00 0.00 0.00 0.00
10:31:25 AM 183.50 277.67 8113.28 1.61 4045.47 0.00 0.00 0.00 0.00
10:31:30 AM 202.41 280.08 11409.46 2.82 3114.29 0.00 0.00 0.00 0.00
10:31:35 AM 243.37 255.42 8815.46 2.21 1905.62 0.00 0.00 0.00 0.00
10:31:40 AM 92.37 194.38 5890.96 1.00 1059.84 0.00 0.00 0.00 0.00
10:31:45 AM 283.70 313.08 12742.05 2.21 5263.38 0.00 0.00 0.00 0.00
10:31:50 AM 414.83 11179.96 7938.48 2.00 45495.59 39413.23 0.00 784.17 1.99
10:31:55 AM 198.79 31014.95 9007.47 2.63 2374.95 0.00 0.00 0.00 0.00
10:32:00 AM 235.74 25065.86 10159.84 2.61 1866.47 0.00 0.00 0.00 0.00
10:32:05 AM 202.01 37361.45 11010.24 2.01 3250.00 0.00 0.00 0.00 0.00
10:32:10 AM 256.91 5640.48 7596.59 3.01 3638.08 0.00 0.00 0.00 0.00
10:32:15 AM 246.89 20823.65 5411.42 1.80 1704.21 0.00 0.00 0.00 0.00
10:32:20 AM 114.46 41366.27 6625.30 0.80 1352.41 0.00 0.00 0.00 0.00
10:32:25 AM 188.76 20948.19 25422.09 1.81 8850.20 0.00 0.00 0.00 0.00
10:32:30 AM 177.15 29934.67 9358.52 1.60 54522.65 42292.59 4315.83 1071.14 2.30
10:32:35 AM 237.83 9914.69 9167.40 2.21 2483.50 0.00 0.00 0.00 0.00
10:32:40 AM 207.71 81296.55 8555.17 2.64 2631.85 0.00 0.00 0.00 0.00

The test itself reports latencies over 100 ms and in this mode we occasionally see I/O operations taking around 200 ms reported.

We can change the kernel parameter dynamically while the test is running and see the behaviour change almost immediately:

10:35:15 AM pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s pgscand/s pgsteal/s %vmeff
10:35:20 AM 365.06 15634.14 6300.40 3.41 3841.57 0.00 15241.77 2644.18 17.35
10:35:25 AM 333.06 5519.35 9262.10 3.43 8639.31 0.00 92890.32 4528.63 4.88
10:35:30 AM 1158.15 20868.81 10292.96 10.06 12215.09 0.00 255137.22 7858.55 3.08
10:35:35 AM 781.12 41385.54 7742.77 5.02 5841.16 0.00 34506.02 3422.89 9.92
10:35:40 AM 518.10 8764.47 2524.85 3.25 2906.59 0.00 1703326.11 2016.93 0.12
10:35:52 AM 2576.57 39524.85 13449.49 11.31 10332.12 0.00 1153144.24 4256.77 0.37
10:35:57 AM 2707.22 40786.31 7962.55 8.17 9893.92 0.00 4246095.82 6729.66 0.16
10:36:02 AM 1600.75 1889.37 2551.12 4.34 629.04 0.00 3595585.63 253.52 0.01
10:36:16 AM 756.94 39362.58 2063.18 8.25 3785.71 0.00 4238635.01 1814.29 0.04
10:36:21 AM 990.94 9277.31 1584.26 6.24 1692.88 0.00 6222810.91 833.73 0.01
10:36:52 AM 69.73 0.00 116.91 0.96 271.29 0.00 2056531.75 7.20 0.00

The number of pages scanned per second escalates.

Meanwhile I/O latencies reported by the test program escalate up to 36000 ms. We actually have to kill the test program within 30 seconds of changing the kernel parameter to avoid the system becoming so unresponsive it cannot maintain sshd connections.

In our real world Informix example we are not using the page cache anything like as aggressively and when the problem occurs I/O demands reduce as we get down to a single thread in critical section. Thus we don’t see pages scanned at the rate in the test, just a clear increase.

It’s worth mentioning that new NUMA capabilities have been added to the Linux kernel in version 3.8 (and later in 3.13) so RHEL 7 users might see slightly different behaviour.


Simple password encryption

Here’s a blog post on something I didn’t even know existed until recently: Informix simple password encryption. This does what it says in the tin and nothing else: it encrypts the password you use to authenticate to the database server which is sent in the clear otherwise. If you weren’t aware of this then it’s certainly worth reading on!

I don’t know when this feature came into the product but some of my IBM contacts didn’t seem to be aware that it exists either and having tried to use it, it’s apparent that this is a cobweb area of the product with silly niggles stopping it from working properly in a client/server environment.

However it is a documented feature and the documentation is here. I’ve linked to the 11.70 docs but, as far as I can tell, the 12.10 docs are identical.

So why am I not using the more advanced network data encryption which has come into the product more recently, encrypts all your network traffic using SSL and is probably better supported? There are two reasons: this does more than I need and there will be an overhead to using SSL. This may be small but I have not had the chance to quantify it yet.

Below is what your auditor doesn’t want to see. Below is a TCP dump of me connecting to an environment called ‘test 4’ with user test_user and password ABCDefgh1234. I’m using a standard onsoctcp TCP connection.

To do this test I put in a file called connect.sql the code:

connect to 'sb_test4' user 'test_user' using 'ABCDefgh1234';

Then I connect using dbaccess:

$ dbaccess - connect.sql
Connected.
Disconnected.

At the server end I’m capturing traffic and (oh dear):

# tcpdump -nnvvXSs 1514 -i eth0 port 9099 and tcp

0x0000: 4500 01e5 1902 4000 4006 a0a9 ac10 936a E.....@.@......j
0x0010: ac10 93dc 80da 238b 42f5 709c 3748 53b9 ......#.B.p.7HS.
0x0020: 8018 0073 3af7 0000 0101 080a 43e7 699c ...s:.......C.i.
0x0030: f8a3 d51b 7371 4161 3042 5051 4141 7371 ....sqAa0BPQAAsq
0x0040: 6c65 7865 6320 7465 7374 5f75 7365 7220 lexec.test_user.
0x0050: 2d70 4142 4344 6566 6768 3132 3334 2039 -pABCDefgh1234.9
0x0060: 2e32 3430 2041 4141 2342 3030 3030 3030 .240.AAA#B000000
0x0070: 202d 6473 625f 7465 7374 3420 2d66 4945 .-dsb_test4.-fIE
0x0080: 4545 4920 4442 5041 5448 3d2f 2f74 6573 EEI.DBPATH=//tes
0x0090: 7434 5f74 6370 2043 4c49 454e 545f 4c4f t4_tcp.CLIENT_LO
0x00a0: 4341 4c45 3d65 6e5f 5553 2e38 3835 392d CALE=en_US.8859-
0x00b0: 3120 4e4f 4445 4644 4143 3d6e 6f20 434c 1.NODEFDAC=no.CL
0x00c0: 4e54 5f50 414d 5f43 4150 4142 4c45 3d31 NT_PAM_CAPABLE=1
0x00d0: 203a 4147 3041 4141 4139 6232 3441 4141 .:AG0AAAA9b24AAA
0x00e0: 4141 4141 4141 4141 4139 6332 396a 6447 AAAAAAAAA9c29jdG
0x00f0: 4e77 4141 4141 4141 4142 4141 4142 5041 NwAAAAAAABAAABPA
0x0100: 4141 4141 4141 4141 4141 6333 4673 5a58 AAAAAAAAAAc3FsZX
0x0110: 686c 5977 4141 4141 4141 4141 567a 6357 hlYwAAAAAAAAVzcW
0x0120: 7870 4141 414c 4141 4141 4177 414b 6447 xpAAALAAAAAwAKdG
0x0130: 567a 6444 5266 6447 4e77 4141 4272 4141 VzdDRfdGNwAABrAA
0x0140: 4141 4141 4141 6154 4141 4141 4141 4142 AAAAAAaTAAAAAAAB
0x0150: 746e 6457 5668 596d 5630 5958 4277 6448 tndWVhYmV0YXBwdH
0x0160: 4e30 4d44 6375 6332 7435 596d 5630 4c6d N0MDcuc2t5YmV0Lm
0x0170: 356c 6441 4141 4443 396b 5a58 5976 6348 5ldAAADC9kZXYvcH
0x0180: 527a 4c7a 4977 4141 4151 4c32 6876 6257 RzLzIwAAAQL2hvbW
0x0190: 5576 6447 6876 6258 427a 6232 3569 4141 UvdGhvbXBzb25iAA
0x01a0: 4275 4141 5141 4141 4153 4148 5141 4a51 BuAAQAAAASAHQAJQ
0x01b0: 4359 6c39 6b41 4143 6352 4142 7376 6233 CYl9kAACcRABsvb3
0x01c0: 4230 4c32 6c75 5a6d 3979 6257 6c34 4c32 B0L2luZm9ybWl4L2
0x01d0: 4a70 6269 396b 596d 466a 5932 567a 6377 Jpbi9kYmFjY2Vzcw
0x01e0: 4141 6677 00 AAfw.

You can see in bold both the user name and password.

Simple password encryption is easy to set up but, as we’ll see, the manual misses a step or two and a couple of hacks are needed to get this working. All the problems are at the client end and you won’t see them if you test the connectivity entirely within the server using a full engine install.

At the server end we need a concsm.cfg file, which by default lives in $INFORMIXDIR/etc/concsm.cfg but you can override with the easy to remember INFORMIXCONCSMCFG environment variable, which works in a similar fashion to INFORMIXSQLHOSTS.

My server concsm.cfg file contains the following:

SPWDCSM("/opt/informix_test4/lib/csm/libixspw.so", "", "p=1")

I also set up an additional sqlhosts entry:

test4_tcp_secure onsoctcp myserver 9101 csm=(SPWDCSM)

Finally I make sure test4_tcp_secure is listed as one of my DBSERVERALIASES in my onconfig and bounce the instance. Unfortunately I don’t think this parameter is dynamically configurable.

We can of course now test this all with the confines of the server and it will work!

Let’s move onto the client side where things are not quite as straightforward.

One of my mottos is that everyday is a school day and on one day last week I learnt that when you install the Informix server the gskit is installed for you. The gskit is mentioned in the machine notes for the Linux x86_64 release, for example:

14. Secure Sockets Layer

IBM Informix Database Server uses the libraries and utilities provided by
the IBM Global Security Kit (GSKit) for Secure Sockets Layer (SSL)
communication.

a. Before uninstalling GSKit, verify that it is not needed on your
system. It is possible that software other than Informix Database
Server requires GSKit. Uninstall by identifying and removing GSKit
packages using the command-line interface:

Run rpm command with the -qa option to obtain a list of installed
GSKit packages with their exact names.
For example:
rpm -qa | grep gsk

As root user run the rpm command to remove each package as needed.
For example:
rpm -ev gskssl64-8.0.15.6 gskcrypt64-8.0.15.6

b. If you want to restore Secure Sockets Layer capability after you
have uninstalled GSKit, see the readme file in $INFORMIXDIR/gskit
for how to install GSKit.

15. Simple Password Communications Support Module

The name of the IBM Informix shared library for Simple Password CSM on
Linux is libixspw.so.

I’ve also drawn attention to point 15, in case this is different on your platform.

The machine notes led me to check this (still on the server):

> rpm -qa | grep gsk
gskssl32-8.0-15.6.i386
gskcrypt32-8.0-15.6.i386
gskcrypt64-8.0-50.10.x86_64
gskssl64-8.0-50.10.x86_64

At the client end I am using Client SDK 3.70.FC8DE. The set up steps are very similar to the server:

  • Create a concsm.cfg file, optionally using the INFORMIXCONCSMCFG variable.
  • Add a sqlhosts entry, similar to the server.

The concsm.cfg file on the client end is different to that on the server and reflects the different path to the libixspw.so file:

SPWDCSM("/opt/informix/lib/client/csm/libixspw.so", "", "p=1")

However on the client side it may be necessary to manually install the gskit. It is simple to do, run as user root:

$INFORMIXDIR/gskit/installgskit

This step isn’t obviously documented anywhere. I had to resort to strace on dbaccess to find what was wrong when my connection didn’t work:

open("/lib64/tls/libgsk8ssl_64.so", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/lib64/libgsk8ssl_64.so", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/lib64/tls/libgsk8ssl_64.so", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/lib64/libgsk8ssl_64.so", O_RDONLY) = -1 ENOENT (No such file or directory)
munmap(0x7fc5d5299000, 40473) = 0
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV (core dumped) +++
Segmentation fault (core dumped)

The above gave me enough clues to find references to the gskit in the manual and fix the problem.

There is another gremlin as well and without this fix you will see unhelpful error messages like this when using dbaccess:

14581: Cannot open file 'css.iem'.

It turns out it’s necessary to add links to some of the language files, which have different names when distributed with Client SDK to the server.

The above error message can be fixed on the client by doing the following:

cd $INFORMIXDIR/msg/en_us/0333
ln -s ccss.iem css.iem

IBM Informix support also pointed out that this is needed too:

ln -s ccsm.iem csm.iem

It turns out these steps are necessary only for dbaccess and not for applications using IConnect. Hopefully these will be fixed in a future Client SDK version.

Lastly a repeat of my tcpdump test with an encrypted password:

0x0000: 4500 01e7 2471 4000 4006 9538 ac10 936a E...$q@.@..8...j
0x0010: ac10 93dc a457 238d 3bd6 ab8f e3a9 caed .....W#.;.......
0x0020: 8018 0073 1b2e 0000 0101 080a 4418 8fe2 ...s........D...
0x0030: f8d4 fb61 7371 4161 3842 5051 4141 7371 ...asqAa8BPQAAsq
0x0040: 6c65 7865 6320 7465 7374 5f75 7365 7220 lexec.test_user.
0x0050: 2039 2e32 3430 2041 4141 2342 3030 3030 .9.240.AAA#B0000
0x0060: 3030 202d 6473 625f 7465 7374 3420 2d66 00.-dsb_test4.-f
0x0070: 4945 4545 4920 4442 5041 5448 3d2f 2f74 IEEEI.DBPATH=//t
0x0080: 6573 7434 5f74 6370 5f73 6563 7572 6520 est4_tcp_secure.
0x0090: 434c 4945 4e54 5f4c 4f43 414c 453d 656e CLIENT_LOCALE=en
0x00a0: 5f55 532e 3838 3539 2d31 204e 4f44 4546 _US.8859-1.NODEF
0x00b0: 4441 433d 6e6f 2043 4c4e 545f 5041 4d5f DAC=no.CLNT_PAM_
0x00c0: 4341 5041 424c 453d 3120 3a41 4730 4141 CAPABLE=1.:AG0AA
0x00d0: 4141 3962 3234 4141 4141 4141 4141 4141 AA9b24AAAAAAAAAA
0x00e0: 4141 3963 3239 6a64 474e 7741 4141 4141 AA9c29jdGNwAAAAA
0x00f0: 4141 4241 4141 4250 4141 4141 4141 4141 AABAAABPAAAAAAAA
0x0100: 4141 4163 3346 735a 5868 6c59 7741 4141 AAAc3FsZXhlYwAAA
0x0110: 4141 4141 4156 7a63 5778 7041 4141 4c41 AAAAAVzcWxpAAALA
0x0120: 4141 4141 7741 5264 4756 7a64 4452 6664 AAAAwARdGVzdDRfd
0x0130: 474e 7758 334e 6c59 3356 795a 5141 4161 GNwX3NlY3VyZQAAa
0x0140: 7741 4141 4141 4141 4c43 6a41 4141 4141 wAAAAAAALCjAAAAA
0x0150: 4141 625a 3356 6c59 574a 6c64 4746 7763 AAbZ3VlYWJldGFwc
0x0160: 4852 7a64 4441 334c 6e4e 7265 574a 6c64 HRzdDA3LnNreWJld
0x0170: 4335 755a 5851 4141 4177 765a 4756 324c C5uZXQAAAwvZGV2L
0x0180: 3342 3063 7938 794d 4141 4145 4339 6f62 3B0cy8yMAAAEC9ob
0x0190: 3231 6c4c 3352 6f62 3231 7763 3239 7559 21lL3Rob21wc29uY
0x01a0: 6741 4162 6741 4541 4141 4145 6742 3041 gAAbgAEAAAAEgB0A
0x01b0: 4355 416d 4a66 5a41 4141 6e45 5141 624c CUAmJfZAAAnEQAbL
0x01c0: 3239 7764 4339 7062 6d5a 7663 6d31 7065 29wdC9pbmZvcm1pe
0x01d0: 4339 6961 5734 765a 474a 6859 324e 6c63 C9iaW4vZGJhY2Nlc
0x01e0: 334d 4141 4838 00 3MAAH8.

The password has gone. Where has it gone? It’s a bit hard to say given it’s now encrypted but having checked the tcpdump for the entire session I am sure that:

  • It is not sent in the clear.
  • We are not authenticating by hosts.equiv or any other passwordless means.
  • Sending the wrong password leads to a log on failure.

Auditing and onaudit

Compliance is one of those things you can hardly ignore as a DBA these days. Whether it’s a PCI-DSS, financial or internal best practice audit, at some point someone is going to ask you whether you are using database auditing. In my experience the auditors struggle to ask Informix specific questions but this is one that always comes up.

I guess there are three answers to this question:

  • Yes, we use database auditing.
  • No we don’t use Informix auditing but we have a third party solution somewhere else in the stack that means someone else worries about it.
  • Can we have a compensating control, please?

Rarely I find that auditors are too concerned about the detail of what you’re actually auditing. If you can log in, do some stuff and show them that this resulted in some messages in the audit log, they are usually happy. They are usually more concerned about where the logs go, who can read them and so on.

While the auditors are clearly not Informix DBAs familiar with all the auditing pneumonics, they are not daft and know they can take most of what they asked for granted next year and ask more probing questions next time.

So should you look at onaudit for your requirements? It’s been around a long time but I expect it may see a pick up in interest as more and more systems take payments in one way or another. In some ways it could do with some updates. Integration with syslog, allowing easy upload to a centralised question, is needed. There is an RFE open for this (id 58678). It’s not mine but it had six votes when I last checked and it deserves more!

Positives about onaudit include:

  • It’s free with all editions.
  • Provided you stay away from selective row auditing (I don’t cover this in this blog post) and don’t try to audit much or any of what your application does the overhead is negligible.
  • It gives you as a DBA a clearer idea of what is happening on your system.

So I think it’s certainly worthy of consideration. I know some customers prefer security solutions external to the database like Guardium but these are costly. I don’t know much about them so I shall leave that thought there.

Auditing needs to be part of a more general secure framework. If everyone at your site logs in as user informix or any shared account, the worst case being the same account as your application, it’s not going to be as useful. Applying rules by user will be difficult or impossible.

Some sites I’ve seen let DBAs do all their work as user informix. It definitely saves developing a more secure framework for DBAs to work in (this is not a good thing!) but has disadvantages. Even if you avoid shared passwords by using sudo to informix (on UNIX) having logged in as yourself, you’d need then to cross-check with the secure logs on your server to see who it was and if two people have escalated privileges at the same time it can be tricky to distinguish their actions. Ideally you need DBAs and every other real person working under their own user ids as much as possible.

To work as a DBA without access to the informix account you simply add yourself to the same group as the group owning the $INFORMIXDIR/etc folder and grant yourself dba in any databases you need to do DDL in, plus sysmaster, sysadmin, sysutils, sysha and sysuser but it still presents the following challenges which may require specific sudo type solutions:

  • Starting an instance; stopping one is no problem.
  • Running xtrace and certain oncheck commands.

Additionally as a DBA you may need root access occasionally for installations, upgrades and to use debuggers.

So before you even start there are some highly desirable prerequisites:

  • Your applications use their own account or (ideally) accounts and real users cannot run ad-hoc sessions using these.
  • Real users don’t use shared accounts (and especially not shared passwords). This means locking down the informix account.
  • DBAs practise what they preach and administer the system under their own accounts as much as possible.

Getting this far can be a struggle but even if you’re only some of the way there, you can still proceed.

The next step is consider whether to install Informix with role separation. I’m not going to discuss this at length so I’ll point to the documentation. There are no real gotchas here: it works pretty much as it says on the tin. The key idea is that it separates the DBAs from the people who decide what should be audited and who can see the audit trail. In practice I think total separation is impossible because the people deciding what should be audited need to understand the impact on the system of what they audit and the volume of data this produces. It is certainly possible to slow a system down by auditing every update.

So you’re now ready to switch on auditing? Nearly. If you monitor your system via onstat or have scripts which call onmode, ‘onmode -c [un]block’ being a specific example where care is required, you need to be aware that in all but the latest Informix releases, this includes right up to 12.10.FC5W1, as soon as you switch on auditing your onstat and onmode commands will run more slowly. This can also affect admin API command equivalents and not just the ones which are direct equivalents for onmode. The situation can get quite bad when lots of these commands run at the same time, leading to significant delays in the response from these commands.

Fortunately there are some fixes for this:

IC89645
TURNING ON THE AUDITING LEVEL 1 ADDS AN UNNECESSARY DELAY TO ONSTAT AND ONMODE COMMANDS

This has been around for a while and appeared in 11.70.FC7W1. However it is not very effective and only eliminates the delay if the volume of onstat commands being run on your system is low.

IT08812
TURNING ON THE AUDITING LEVEL 1 ADDS AN UNNECESSARY DELAY TO ONSTAT & ONMODE COMMANDS

This is completely effective and means that onstat and onmode behave identically to when auditing is switched off but it only works if you do not have any audit masks which log the use of these commands.

There are workarounds for the auditing delay such as using sysmaster equivalents for the onstat commands and performing onstat commands inside an onstat -i interactive session.

Finally you’ll want to consider setting up some audit masks. I take the following approach to this:

_require mask

This mask defines the minimum events to be audited for all users. I put everything that’s common in here.
_default mask
If an account is not assigned a specific mask, it will pick up all the events in there. To avoid having to assign masks to all real users, I don’t assign them any mask and then they automatically inherit this one (in addition to what is in the _require mask).
Other masks
For my applications and other accounts needing special treatment, I create a custom mask and assign it to the user.

Finally if you’re feeling brave switch auditing on with some commands like:

onaudit -p /path/to/audit/trail
onaudit -s 1048576 # 1 Mb files
onaudit -e 0
onaudit -l 1

Now there is just that application security model for you to tackle.

Good luck and may you sail through your audits!