Monitoring Informix with Grafana


In a presentation I gave at IIUG 2017 titled Making system monitoring better I showed, without much detail, how Grafana is a powerful tool for visualising what is happening within your Informix server. Performance metrics from your database server are collected at regular (usually 10 second) intervals and stored in a time-series database which can be used as the source for dashboards containing dynamic graphs and other ways of presenting the data. For the DBA the benefits are legion:

  • Quickly change the time range to zoom into when problems occurred or zoom out to see trends.
  • Correlate various database metrics and combine then with related operating system, network, storage or application metrics.
  • Get a truer picture of your busy periods, capacity, the effect of scheduled jobs etc.
  • Faster problem resolution because a lot of data can be visualised at once.

You might be able to think of some others.

The talk also touched on the CAMS acronym:


So you shouldn’t keep your dashboards to yourself: share them with other technical teams or everyone in your company. This has the added benefit of more eyes and others can learn to spot database problems, or when they are probably not database problems, by referring to these.

Why Grafana?

There are a number of tools which appear to do a similar job:

You perhaps haven’t heard of Brunia: it is the code name for a prototype monitoring tool that may replace Informix Open Admin Tool (OAT) in the future. It was demonstrated at IIUG 2017 and is probably closest to Prometheus in its execution. AGS Sentinel is the monitoring add-on to the popular ServerStudio suite for Informix. The rest are popular open source tools which other teams in your organisation are probably already using.

Some of the tools listed above can also produce events or alerts when a trigger condition occurs and automatically pass this up a stack to PagerDuty or another call-out system. An example of such a stack is Prometheus -> Alertmanager -> PagerDuty -> StatusPage

There are a lot of ways of implementing a full monitoring stack with choices to make about data collection, storing, visualisation, analysis and alerting. In this blog post I am going to concentrate on a simple set up where we collect Informix metrics, pass them to InfluxDB using a REST API and visualise in Grafana. For a fuller discussion of the benefits of the three open source technologies mentioned above I highly recommend reading this blog post from Loom Systems written in June 2017, Prometheus vs. Grafana vs. Graphite – A Feature Comparison.

In case you’re not going to read the LS blog it’s worth emphasising what the InfluxDB/Grafana solution I am about to describe does not provide:

  • There is little in the way of monitoring and alerting features.
  • Regression models enabling you to predict the value of a metric in the future are not available.
  • Advanced time series functions are not available.

The solution would be richer if Graphite was used as the data source and Grafana for visualisation only. This would provide more aggregation functions and allows you to do things like subtract one time series from another. As an example of what this might provide, I have a dashboard (not covered in this blog post) displaying the buffer turnover ratio and buffer waits ratio over an arbitrary moving window irrespective of when onstat -z was last run.

It is easy to confuse Graphite and Grafana, especially as both can be used independently, or Graphite can be a data source for Grafana.

As this is an Informix blog I ought to explain why I am using InfluxDB and not Informix time series? The simple answer is that to use Informix time series with Grafana properly someone would have to write and maintain a data source plugin for it like the one for InfluxDB we’ll be using. Doing so would give something more feature rich than InfluxDB for sure but perhaps not much more powerful than a Graphite/Grafana combination.

What to monitor

Potentially anything we can put a value to every ten seconds can be collected and stored in InfluxDB (which is a statement you can make about time series collections in general). For Linux operating system metrics there is a well-established collection daemon called collectd and, if I had better C programming skills, I could a collectd plugin for Informix.

For Informix systems the most obvious source is the system monitoring interface (SMI) which is the presentation of information held in shared memory through pseudo-tables in the sysmaster database. This covers the vast majority of what can be collected using onstat but is easier to handle in a programming language. Doing it this way means we can also collect real table data in the same manner.

For example the system profile or onstat -p can be captured with the following SQL:

TRIM(name) AS name,
name IN ('dskreads', 'bufreads', 'dskwrites', 'bufwrites', 'isamtot', 'isopens', 'isstarts', 'isreads', 'iswrites', 'isrewrites', 'isdeletes', 'iscommits', 'isrollbacks', 'latchwts', 'buffwts', 'lockreqs', 'lockwts', 'ckptwts', 'deadlks', 'lktouts', 'numckpts', 'plgpagewrites', 'plgwrites', 'llgrecs', 'llgpagewrites', 'llgwrites', 'pagreads', 'pagwrites', 'flushes', 'compress', 'fgwrites', 'lruwrites', 'chunkwrites', 'btraidx', 'dpra', 'rapgs_used', 'seqscans', 'totalsorts', 'memsorts', 'disksorts', 'maxsortspace')

It’s important to understand that all of these metrics are what I’d term counters. That is they only increase over time (unless they get so large they run out of bits and wrap or a DBA runs onstat -z). It gets difficult to see on a graph the difference between, say, 2394472 and 2394483 and so it’s useful to calculate a delta over the ten second window. Some things you might collect are automatically suitable for graphing because they are gauges: an example of this is the number of threads in your ready queue at any given moment.


Practical demonstration with Docker containers

Nothing better than an example you can try at home (or work!). In the implementation example I will be using the IBM Informix Developer Edition Docker container which, at time of writing, runs Debian 8 (jeesie) and a second Docker container for InfluxDB and Grafana. You’ll of course need Docker installed on your laptop or workstation for this to work.

What this demonstration is going to build will look like the above. A collector script will collect metrics from Informix at a regular interval and post the results to InfluxDB. You will be able to use your usual web browser to connect to Grafana and visualise the data. Sounds simple?

We’ll start by setting up the InfluxDB/Grafana Docker container which will be also be using on a (minimal) Debian installation. In a terminal run:

docker pull debian
docker run -it --name influx_grafana_monitoring -p 8086:8086 -p 3000:3000 --hostname grafserv -e "GF_SECURITY_ADMIN_PASSWORD=secret" debian

Your terminal should now be inside the Docker container and logged in as root. Run these commands to install some extra packages and then InfluxDB:

apt-get update
apt-get -y install curl gnupg apt-transport-https procps
curl -sL | apt-key add -
echo "deb jessie stable" | tee -a /etc/apt/sources.list
apt-get update
apt-get -y install influxdb

Next install Grafana in the container:

echo "deb jessie main" | tee -a /etc/apt/sources.list
curl | apt-key add -
apt-get update
apt-get -y install grafana

Start the both services inside the container:

/etc/init.d/influxdb start
/etc/init.d/grafana-server start

We need to create an Influx database to store our time series data and we can do this with a REST API call:

curl -i -XPOST http://localhost:8086/query --data-urlencode "q=CREATE DATABASE informix"

If it works you should see a HTTP/1.1 200 OK response.

You should now be able to access the Grafana server running in your Docker container from your usual web browser at http://localhost:3000/

Log in with the user name admin and the password secret. Once logged in click Add data source and fill in the settings as follows (some of them are case-sensitive):


HTTP settings



Basic auth
Leave unticked
With credentials
Leave unticked

InfluxDB Details

Leave blank
Leave blank
Min time interval
Leave at 10s

All being well you should see Data source is working in a big green box.

Now we are going to set up the Informix container to monitor. On your workstation in another terminal run:

$ docker pull ibmcom/informix-developer-database
docker run -it --name iif_developer_edition --privileged -p 9088:9088 -p 9089:9089 -p 27017:27017 -p 27018:27018 -p 27883:27883 --hostname ifxserver -e LICENSE=accept ibmcom/informix-developer-database:latest

The command above should provide a running Informix instance which may take a few moments after which control is passed back to the terminal. We are now going to get the scripts that will send data to InfluxDB:

sudo -i
apt-get update
apt-get -y install git libdbi-perl libjson-perl libwww-curl-perl make gcc libtest-pod-perl

We need to get the Perl DBD::Informix package from CPAN which will download, compile, test and install it for us.

. /home/informix/ifx_dev.env
export DBI_DBNAME=sysmaster
export DBD_INFORMIX_DATABASE=sysmaster

Enter ‘yes’ to configure as much as possible. In the CPAN console type the case-sensitive command:

install DBD::Informix

There is quite a lot that could go wrong in the CPAN console but it should work if you’re using the IBM Informix DE Docker container and follow the steps exactly. If you’re installing on RedHat Linux or a derivative the required RPM package names you use with yum install will all be different.

Type logout to exit the root shell. You should be logged in as user informix again. Leave this session for later.

Run on your local workstation (outside both Docker containers) in another terminal window:

git clone

This assumes you have git installed. There are two important files in the post_to_influxdb directory:

  • informix_smi_influx_uploader
  • informix_smi.json

You will need to edit informix_smi.json and change all references from mydatabase to the name of the user/application database you want to monitor. For the purposes of the blog post in this demo, we are just going to monitor the sysmaster database so change all mydatabase references to sysmaster.

You can copy the files to your Informix Docker container as follows. To get the name of your Informix Docker container (not its hostname) simply type docker ps on your workstation.

$ docker ps
5a9c73712429 debian "bash" 8 minutes ago Up 8 minutes>3000/tcp,>8086/tcp influx_grafana_monitoring
e1c178b57164 ibmcom/informix-developer-database:latest "/bin/bash informi..." 13 minutes ago Up 13 minutes>9088-9089/tcp,>27017-27018/tcp,>27883/tcp iif_developer_edition

From the above my Informix container name is e1c178b57164

docker cp informix_smi_influx_uploader e1c178b57164:/home/informix
docker cp informix_smi.json e1c178b57164:/home/informix

We should be ready to start collecting metrics and posting them to InfluxDB. Run in the Informix container as user informix:

./informix_smi_influx_uploader -c ./informix_smi.json -i 10 -u http://other_container_ip:8086/write?db=informix

Change other_container_ip to the IP address of your InfluxDB/Grafana container. You must use the IP address unless you have name resolution which this basic Docker set up does not. If you don’t know what this is you can ping the docker container name from inside the InfluxDB/Grafana container using a command like ping -c 1 grafserv

All being well the Perl script should run continuously and collect and post data to InfluxDB every 10 seconds or whatever interval you specified with the -i switch.

To see anything in Grafana we’ll need to set up a dashboard. The file informix_server.json in the grafana_dashboard directory describes a Grafana dashboard. You’ll need to edit it a bit first and change all occurrences of the following:

<%= @hostname %>
hostname of your Informix docker container, normally the unqualified hostname of your Informix server
<%= @informixserver %>
Name of your Informix instance, dev

In the Grafana web browser click the Grafana logo and then Dashboards. Click Import. At Import Dashboard click Upload .json File. Hey presto, you should have a dashboard with graphs. Some may not display any data, e.g. Temporary dbspace usage, because there are no temporary dbspaces in the development Docker image by default.

Making this ready for production

There are a few bits I’ll leave to you for any production implementation:

  • The collection shouldn’t run as user informix. Create a user for the monitoring and give it just the CONNECT and SELECT privileges it needs.
  • You’ll also need to write a script to start/stop the collection with the instance.
  • Linux operating system statistics, gathered through collectd would complement the dashboard very nicely.
  • You’ll probably want to customise both the JSON file containing the SMI queries and the one describing the dashboard. You could add application metrics than can be collected from the database or create multiple dashboards especially if you don’t like the idea of one big one showing everything. Bear in mind any queries you write need to be fast and will run every 10 seconds.
  • In the dashboard you may need to add/remove different page sizes to/from the buffer, page and disk reads/writes graphs.


It can be very useful to track virtual segment and temp. space usage on your server and correlate with events like update stats, ontape/onbar backups or application activity. You can use other tools to do this but these often are not as accessible or are purely in the realm of the DBA. A Grafana dashboard like the one described here should be very useful for you and colleagues, especially if they have their own dashboards on the same system which allow you to view your systems as a whole, and it might go some distance to demystifying Informix in your organisation.


Passwords encrypted over the network: why is this feature not enabled by default?

In 2015 I wrote a blog post about using simple password encryption (SPW) and how – without it – your valuable passwords can be trivially sniffed on your network. If you look through the post it illustrates the vulnerability and shows just how easy it is to set your system up in a more secure way.

SPW only encrypts your password when you connect. Not everyone wants or needs full encryption of all their traffic but what reasons are there not to use SPW?

  • It requires a small amount of set up extra work, although this can (should?) be automated.
  • It means your database engine spawns some extra cssmbox_cn threads, although they are only used at connection time and the overhead is low.
  • Consideration should be given to patching the IBM Global Security Kit (GSKit) separately from the server and client, both of which bundle it.

I don’t know of any other drawbacks. In my opinion these are nothing substantive then when you consider your peace of mind.

If you have Fix Central access you can always download the latest GSKit from here. Although it’s used by many IBM products it’s filed under Tivoli which isn’t obvious at all.

Patching the GSKit separately isn’t necessarily something you need to do but it isn’t only used by SPW: if you’ve set ENCRYPT_HDR, ENCRYPT_SMX or ENCRYPT_CDR, for example, you are using it. The GSKit doesn’t get installed in INFORMIXDIR; it’s installed by RPM (on Linux) to /usr/local/ibm and only one version can exist on your server. So if you’re used to pre-installing a new version of Informix server or Client SDK in its own folder prior to an upgrade, be aware that you may just have unwittingly upgraded the GSKit.

The feature has suffered a few issues lately and is currently broken when used with the Informix JDBC driver in 11.70.xC9; connections supported by CSDK or IConnect work fine. I think the feature would be more dependable if more people used it (or if the product testing stress tested this area). Here are some relatively recent issues:

  • All recent JDBC drivers earlier than 4.10.JC8 (including 3.70.JC8W1) suffer from an issue where a small proportion of connections will fail. You might not notice this if your application can capture logon failures and retry automatically. There is no APAR for this that I know of as 4.10.JC8 was extensively reworked for JDBC 4.0 support.
  • Informix 11.70.xC9 contains fix IT10493 but this caused a high rate of logon failures with SPW and fix IT17087 is additionally needed but not included.
  • If you’re using the 12.10 code line you need xC8 or later to get the same fix.
  • CSDK 4.10.FC8 ships with an incompatible GSKit version,, but actually requires (APAR IT18763). You may not notice this, however, if your server software ships with a later version.

I hope this doesn’t come across as a moan, more a call to action.

When do my stored procedure execution plans get updated?

For the sake of brevity in this article I am going to group procedures, functions and routines together as stored procedures and ignore any differences between them.

What does the SQL command UPDATE STATISTICS FOR PROCEDURE/FUNCTION/ROUTINE does and perhaps, more pertinently, as a DBA do I need to run this regularly to ensure my systems are working efficiently? For those wanting an immediate answer I think it is “never” or “almost never“, the reasons for which I hope to explain clearly in this article.

The command itself is straightforward: calling it causes Informix to parse a stored procedure’s code and produce a query plan for all of it based on the current statistics and data distributions (if the procedure references any tables). It then writes the query plan to the sysprocplan table which is, unless you have an unlogged database, a logged operation written to the logical log. Used with no parameters it does this for all stored procedures in the system.

As long as the query plan in sysprocplan is reasonably efficient there is probably no need to (ever) proactively update it but there may be cases when you’d want to do so, for example, if a very small or empty table has grown into a large one. However if you were to do this your new plan would be based on the current table statistics and data distributions and if these haven’t been updated yet you may get the same, now inefficient, plan.

The manual states:

The sysprocplan system catalog table stores execution plans for SPL routines. Two actions can update the sysprocplan system catalog table:

  • Execution of an SPL routine that uses a modified table

There is a created column in the sysprocplan table but it’s a date and not a date/time which makes it much harder to match plan updates to other events.

So what is a modified table? Quite simply it is one where the version number has been incremented. You can see the version number with an SQL query like:

select version from systables where owner='myowner' and tabname='mytable';

I think the only reference to this in manual is in section about the systables view where it simply says:

Number that changes when table is altered

How the engine works out the dependencies a stored procedure has on different tables falls into the category of system internals, which IBM chooses not to publicly document, but I think it’s safe to say that if a table is referenced anywhere in a procedure it is dependent on it.

There are many ways a table can be “altered”, some more obvious than others:

Method Version number incremented by
ADD column 131072
DROP column 131072

I am not sure why some operations increment the value by large numbers, all powers of 2, as any increment has a similar effect, at least as far as the scope of this article is concerned.

The table is not a complete list because there are many possible DDL operations but this does already illustrate or suggest that:

  • On most systems it’s likely that UPDATE STATISTICS commands will be the main trigger for stored query plans to be updated. If you run LOW, HIGH and MEDIUM modes for a table like you will if you use AUS or dostats, you’ll trigger at least three updates for dependent stored procedures (if they are called).
  • If we want to grant multiple privileges on the same table, it’s best to do it in a single statement because if a dependent stored procedure is being called in between running commands by an application, its stored execution plan will be updated only once.
  • GRANT DBA is not a table level operation yet it has an effect.

Further testing shows that both the GRANT DBA and REVOKE DBA statements increment the version number on all tables in the same database where the tabid is 100 or greater, that is all user tables. From the manual above it follows that the stored query plans for all stored procedures or functions dependent on a table will be updated the next time they are executed.

On our systems we see a large amount of writes to sysprocplan after granting or revoking the DBA privilege to anyone. When graphed we see a sharp peak and a long exponential tail off as less commonly used procedures get called.

Therefore if you grant DBA to a user on a busy live system, it can affect concurrency more than you might expect. On an idle system you may want to run UPDATE STATISTICS FOR PROCEDURE immediately afterwards to update the stored query plans in an orderly way and save the first session to call any given procedure from this overhead.

I think running the command offline to avoid the overhead for user or application sessions is possibly the only true use case for this command.


Improving remote query performance by tuning FET_BUF_SIZE

I thought I’d write blog post as a nice example of where tuning the client-side variable, FET_BUF_SIZE, really speeded up a remote query.

FET_BUF_SIZE is documented by IBM in the context of a Java application using JDBC here and as a server environment variable here.

One thing the documentation warns about is that simply setting this to a high value may degrade performance, especially if you have a lot of connections. With that in mind here are some facts about the query I’m running and using as a basis for these tests:

  • I am just using a single connection to the database.
  • the query returns around 10000 rows and 60 Mb of data.
  • the client and the server are geographically separated from each other and Art Kagel’s dbping utility typically takes around 0.1 seconds to connect remotely; this compares with around 3 milliseconds locally.
  • crucially the query runs in seconds locally on the server but takes over three minutes when run remotely.

If I begin running the query with the default value of FET_BUF_SIZE and monitor waits on the server, I can see that reads only go up slowly and that my session is waiting on a condition (indicated by the Y in position one of column two) more or less all the time:

> while [ 1 ] ; do
> onstat -u | grep thompson
> sleep 1
> done
address flags sessid user tty wait tout locks nreads nwrites
26eb492d18 Y--P-R- 76228 thompson 0 26e67cd298 0 0 552 0
26eb492d18 Y--P-R- 76228 thompson 0 26e67cd298 0 0 552 0
26eb492d18 Y--P-R- 76228 thompson 0 26e67cd298 0 0 560 0
26eb492d18 Y--P-R- 76228 thompson 0 26e67cd298 0 0 560 0
26eb492d18 Y--P-R- 76228 thompson 0 26e67cd298 0 0 568 0
26eb492d18 Y--P-R- 76228 thompson 0 26e67cd298 0 0 576 0
26eb492d18 Y--P-R- 76228 thompson 0 26e67cd298 0 0 592 0
26eb492d18 Y--P-R- 76228 thompson 0 26e67cd298 0 0 624 0
26eb492d18 Y--P-R- 76228 thompson 0 26e67cd298 0 0 624 0

The sixth column shows the rstcb value of the thread I’m waiting on. I can use onstat -g con (print conditions with waiters) to see that I’m waiting on the network:

> onstat -g con | grep -E '^cid|26e67cd298'
cid addr name waiter waittime
5789 26e67cd298 netnorm 84353 0

A quick check with onstat -g ses 76228 shows that thread id. 84353 does indeed correspond to my session.

While the wait time shown above is not increasing it’s a different story when we look at netstat, again on the server:

> netstat -nc | grep ''
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 1312 ESTABLISHED
tcp 0 1284 ESTABLISHED
tcp 0 1306 ESTABLISHED
tcp 0 1302 ESTABLISHED
tcp 0 1194 ESTABLISHED
tcp 0 1206 ESTABLISHED
tcp 0 1266 ESTABLISHED
tcp 0 1304 ESTABLISHED
tcp 0 1318 ESTABLISHED
tcp 0 1248 ESTABLISHED

What the above is showing us is that there are consistently around 1200 to 1300 bytes in the send queue (Send-Q). This is surely our bottleneck.

At this point when investigating the problem I considered modifying other parameters such as OPTOFC and Linux kernel parameters. However with a few moment’s thought it was clear these weren’t going to gain anything: OPTOFC optimises the open-fetch-close sequence and for a single long running query this is not going to give us anything measurable; and an investigation into increasing the Linux kernel parameter related to the send queue size was dismissed when we found that 1300 bytes was well below the maximum allowed.

In Informix 11.50 the maximum value of FET_BUF_SIZE is 32767 (32 kb) but this is increased to 2147483648, or as we’ll see actually 2147483647, (2 Gb) in 11.70 and above. We can therefore move onto to experiment with different values:

FET_BUF_SIZE Query run time (s) Average Send-Q size over 10 samples Maximum Send-Q size observed
Default 221.2 1274 1332
1024 221.1 1255 1326
2048 221.1 1285 1338
4096 221.2 1297 1360
6144 102.1 2564 2676
8192 56.6 5031 5210
16384 22.6 12490 13054
32767 (max. 11.50 value) 11.5 24665 29968
65536 7.0 62188 62612
131072 4.9 115793 127826
262144 4.0 146686 237568
524288 3.5 184320 249856
1048576 3.3 245760 473616
2097152 3.2 249856 486352
2147483647 (max. value – 1) 3.0 245760 549352
2147483648 (supposed max. value) 221.3 1276 1366

As the run times get shorter it gets tricky to measure the Send-Q using netstat -nc: it can be sampled very frequently using a command like:

while [ 1 ] ; do
netstat -n | grep ''

This will produce many measurements per second and with this it’s possible to see it fill up and drain several times in the period while the statement is running.

It’s also interesting to play around with the boundaries. For example, with a FET_BUF_SIZE between around 5500 and 5600 maximum Send-Q sizes the same as those consistently achieved with a FET_BUF_SIZE of 6144 begin to creep into the results but many measurements remain around the values consistently measured wit a FET_BUF_SIZE of 4096:

Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 1316 ESTABLISHED
tcp 0 1318 ESTABLISHED
tcp 0 1278 ESTABLISHED
tcp 0 1352 ESTABLISHED
tcp 0 1288 ESTABLISHED
tcp 0 2546 ESTABLISHED
tcp 0 1278 ESTABLISHED
tcp 0 2502 ESTABLISHED
tcp 0 1266 ESTABLISHED
tcp 0 1314 ESTABLISHED
tcp 0 2506 ESTABLISHED
tcp 0 1292 ESTABLISHED

So what are the conclusions?

  • Increasing FET_BUF_SIZE at the client side can dramatically improve the speed of remote queries.
  • Maximum Send-Q sizes, as measured by netstat, increase in discrete steps as FET_BUF_SIZE is increased.
  • A larger Send-Q allows more data to be cached and reduces waits seen in Informix.
  • To see any improvement at all FET_BUF_SIZE must be increased to at least 6000 (approximate value).
  • Around boundaries between maximum Send-Q sizes there appears to be a cross-over region where maximum send queue sizes overlap from two adjacent values are seen from one second to the next.
  • The maximum value allowed in 11.70 at least is 2147483647 and not 2147483648, as indicated in the documentation.
  • The maximum 11.50 value of 32767 produced a run time nearly 4x slower than an optimised value for 11.70+
  • Other testing I did, not documented here, shows that the results are uniform across JDBC and ESQL/C applications.

Note: all user names, IP addresses and port numbers used in this post have been altered.


Informix or Client SDK install: No Java virtual machine could be found

This is a something of a note to self. For some time it has been been the case that you may see this message when attempting an Informix server or Client SDK install if there is a problem starting the installer’s Java runtime environment:

# LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib64:/lib64 ./ids_install
Preparing to install...
Extracting the JRE from the installer archive...
Unpacking the JRE...
Extracting the installation resources from the installer archive...
Configuring the installer for this system's environment...
No Java virtual machine could be found from your PATH
environment variable. You must install a VM prior to
running this program.

To add insult to injury when this condition occurs the installer exits with status code zero, suggesting all is ok.

Now the obvious thing to do seems to be to install a Java package, wondering whether OpenJDK will suffice or the official Oracle version is needed. This is never the answer! The Informix installer comes bundled with its own Java run time environment (JRE) which gets extracted into /tmp/install.dir.X and your challenge is in fact to find out why it isn’t working as it should.

You can see in my attempt at installing the product I have already prefaced the command with LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib64:/lib64. This is already a known way of fixing some installation problems. (For a 32-bit version you’d simply use /usr/lib:/lib.)

Everyone’s friend, strace, is a great way to start investigating this problem. In amongst the output I find this:

faccessat(AT_FDCWD, "/tmp/install.dir.12813/Linux/resource/jre/jre/bin/java", X_OK) = -1 EACCES (Permission denied)

So why is this? I am logged in as root so I ought not be running into permission denied issues.

The core problem here is the way /tmp, which is a separate filesystem on my machine, is mounted. From the mount command output:

tmpfs on /tmp type tmpfs (rw,nosuid,nodev,noexec,relatime)

The key part here is the noexec flag which is a security feature preventing execution of binary files residing on this filesystem.

The best way to fix this is to set the environment variable IATEMPDIR to a directory on a filesystem where execution is allowed. I usually use /root for this purpose. And success:

# export IATEMPDIR=/root
# LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib64:/lib64 ./ids_install
Preparing to install...
Extracting the JRE from the installer archive...
Unpacking the JRE...
Extracting the installation resources from the installer archive...
Configuring the installer for this system's environment...

Launching installer...

Preparing CONSOLE Mode Installation...

While the above should be sufficient I have seen the server installer still fail to work even with this environment variable set as some files may still be placed in /tmp. In this situation you can temporarily remove the security restriction with:

mount -o remount,rw,nosuid,nodev,relatime,exec /tmp

and switch it back on again with:

mount -o remount,rw,nosuid,nodev,relatime,noexec /tmp

I suggest before running the above you check the existing mount options for your /tmp filesystem.


Intermittent “CSM: authentication error” with JDBC

This article will only concern you if:

  • you connect using JDBC.
  • you use simple password encryption.

Simple password encryption just does one thing: it encrypts the password sent to the database server in transit preventing it from being obtainable by network packet sniffing.

At the client end configuring simple password encryption can be done simply by adding
to your JDBC connection string.

At the server end set up a DBSERVERALIAS and add
to the fifth field in sqlhosts and set environment variable INFORMIXCONCSMCFG before starting the instance to point to a file containing something like:
SPWDCSM("/opt/informix/lib/csm/", "", "p=1")

This is covered in more detail elsewhere and I haven’t covered using CSDK but for JDBC connections it’s all there is to know.

Unfortunately there is a bug in JDBC 3.70.JC8W1 and JDBC 4.10.JC7 and below where every 100th connection attempt or so will fail randomly with this stack:

java.sql.SQLException: CSM: authentication error.
at com.informix.jdbc.IfxSqliConnect.(
at sun.reflect.GeneratedConstructorAccessor3.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
at java.lang.reflect.Constructor.newInstance(
at com.informix.jdbc.IfxDriver.connect(
at java.sql.DriverManager.getConnection(
at java.sql.DriverManager.getConnection(
at Connect.main(
Caused by: com.informix.asf.IfxASFRemoteException:
at com.informix.asf.Connection.recvBindResponse(
at com.informix.asf.Connection.establishConnection(
at com.informix.asf.Connection.(
at com.informix.jdbc.IfxSqliConnect.(
... 7 more

You can see if you’re vulnerable by compiling this app and running it until it fails or you’re reasonably confident you don’t have a problem:

public class Connect
public static void main(String[] args)
Connection conn = null;
String url = "jdbc:informix-sqli://hostname:port/dbname:INFORMIXSERVER=informixserver;user=user;password=password;SECURITY=PASSWORD";

catch (Exception e)
System.out.println("FAILED to load Informix JDBC driver.");

int i=0;
while (true) {
conn = DriverManager.getConnection(url);
catch (SQLException e)
System.out.println("FAILED to connect! "+e);
System.out.println("Connected " + i);
if (conn != null) {
try {
catch (SQLException e) {
System.out.println("FAILED to disconnect! "+e);

If your application handles fails connections and retries automatically you might not have noticed this error or perhaps it was lost in the noise but for more simple applications it can be a pain.

Fortunately this is fixed in JDBC 4.10.JC8 and the fix works with 11.70 and 12.10 versions of the servers.

Interestingly the JDBC release notes for 4.10.JC8 are coy about this, showing just one fix.


Buffer waits

Is it really a year since I last wrote a blog post? It does mean I have had time to think of a few topics to write about.

Recently a desk visitor came to me at work about a performance issue. From what he said I got the impression that he thought that DBAs spend most of their time tuning SQL queries, something I spent little time on during a typical day, perhaps one of the advantages of working on a mature system.

Perhaps though he was kind of right. Many of the things I do are around making sure queries run reliably, consistently and in a scalable manner, I just don’t necessarily do this by looking at explain plans. Don’t get me wrong: these are very important; it’s just that once these are as good as they can be it doesn’t mean there aren’t other ways of finding bottlenecks or contention and tuning them out. For example when I was a more junior DBA I used to concern myself with buffer cache hits and, once I knew a little more, buffer turnover ratios, reducing I/O by allocating more memory as Moore’s Law provided rapid improvements in server CPU speed and memory size.

In the Oracle world DBAs have moved away from this measure and use the Oracle wait interface which allows you to see what operations the engine was doing. We can do this (slightly differently) in Informix too and it’s very useful. The simplest overview of waits is to look at position one of the flags column from “onstat -u”. The Informix documentation states:

Provides the status of the session.
The flag codes for position 1:
B Waiting for a buffer
C Waiting for a checkpoint
G Waiting for a write of the logical-log buffer
L Waiting for a lock
S Waiting for mutex
T Waiting for a transaction
Y Waiting for condition
X Waiting for a transaction cleanup (rollback)

Most sessions will probably show Y which usually means they are waiting on TCP transit or are idle, waiting for the application or user’s session to do something.

As the manual says we can get more detail for buffers through onstat -b, -B and -X; for latches (mutexes) through onstat -s, (also -g lmx and -g wmx) and locks with onstat -k.

onstat -X is the most useful way to examine buffer waits but it is not as user-friendly as it could be, which is a shame because it often offers many clues to where problems lie:

Buffers (Access)
address owner flags pagenum memaddr nslots pgflgs scount waiter

Buffer pool page size: 2048
14700 modified, 16777216 total, 16777216 hash buckets, 2048 buffer size

Buffer pool page size: 4096
2443cd7e8 ffffffffffffffff 80 25:1523486 cce17b000 101 2801 0 0
267ca4628 0 0 47:1570054 105c3c5000 122 2890 1 0
53980 modified, 8388608 total, 8388608 hash buckets, 4096 buffer size

Buffer pool page size: 8192
59577 modified, 8388608 total, 8388608 hash buckets, 8192 buffer size

Buffer pool page size: 16384
3784a8188 ffffffffffffffff 80 162:18904760 4baadf4000 248 890 0 0
37854d188 ffffffffffffffff 80 162:24581408 4baeff4000 248 890 0 0
378ead5e8 ffffffffffffffff 80 124:25597240 4beb010000 187 2801 0 0
378f781a8 ffffffffffffffff 80 124:25597376 4bf0128000 710 890 0 0
3798d3388 ffffffffffffffff 80 124:25597176 4c2bf34000 710 890 0 595236d428
3799321a8 ffffffffffffffff 80 162:13196672 4c2e528000 248 890 0 624d39d668
37a353128 ffffffffffffffff 80 124:25597840 4c6f258000 197 801 0 0
37a4cefe8 ffffffffffffffff 80 168:32233760 4c78a50000 399 890 0 0
37c485d28 ffffffffffffffff 80 264:13942672 4d439d8000 319 890 0 0
37c5b45c8 ffffffffffffffff 80 162:24158848 4d4b2dc000 193 2801 0 0
37c80f368 ffffffffffffffff 80 168:33303832 4d5a400000 303 890 0 0
37caf6ce8 0 10 124:25597160 4d6cd70000 710 890 1 0
37ceaab28 ffffffffffffffff 80 166:8227296 4d84898000 332 890 0 0
37ceba8e8 ffffffffffffffff 80 124:25597648 4d84ef0000 710 890 0 0
37d70f4a8 ffffffffffffffff 80 124:25597208 4dba408000 193 801 0 0
37d891088 ffffffffffffffff 80 162:26376432 4dc3e54000 248 890 0 0
37dc9abe8 58cc3c7eb8 80 144:18435488 4dddbd0000 193 2801 0 0
87962 modified, 13762560 total, 16777216 hash buckets, 16384 buffer size

The key problem here from a usability point of view is that it is tedious to convert the chunk:pagenum format into an actual database object.

There is a similar problem with part numbers when deciphering output from, for example onstat -k that can be solved by downloading and compiling the ESQL/C utility partn from the IIUG software repository.

Loosely based on that here is my Perl script, chunkpg, which can provide friendly names for objects in chunk:pagenum format:


# Decipher chunk and page numbers

use strict;
use warnings;


sub main {

my $key = &check_params;

my $chunkno = 0;
my $nextinfo = 0;
my $pagesize = 0;
my $count = 0;
my $lastdbs = '';
my $syspagesize = 2; # change to 4 kb if required

my (%objs, %start, %end, %c, %ps, %dbs);

open (ONCHECKPE, "oncheck -pe |");
while () {
if ($nextinfo == 1) {
$nextinfo = 0;
my ($blank, $path, $size, $used, $free);
($blank, $chunkno, $path, $pagesize, $size, $used, $free) = split / +/;
# print "$chunkno: $pagesize kb\n";
$count = 0;
$c{$chunkno} = 0;
$ps{$chunkno} = $pagesize;
elsif ($_ eq ' Chunk Pathname Pagesize(k) Size(p) Used(p) Free(p)') {
$nextinfo = 1;
elsif ($_ =~ /^DBspace Usage Report: /) {
my @parts = split / +/;
$lastdbs = $parts[3];
$dbs{$chunkno} = $lastdbs;
elsif ($_ =~ /^ \w+:\'\w+\'\.\w+ +/) {
my ($blank, $obj, $offset, $size) = split / +/;
# printf ("%s: %d -> %d\n", $obj, $offset*2, $offset*2+$size*2);
$objs{$chunkno}{$count} = $obj;
$start{$chunkno}{$count} = $offset;
$end{$chunkno}{$count} = $size + $offset;
if (!$dbs{$chunkno}) {
$dbs{$chunkno} = $lastdbs;
close (ONCHECKPE);

while () {
print "$_ ";
$_ =~ s/^ +//;
my @vals = split / +/;
if ($vals[$key-1] && $vals[$key-1] =~ /\d+\:\d+/) {
my ($chunk, $page) = split /\:/, $vals[$key-1];
for (my $i = 1; $i = $start{$chunk}{$i} && $end{$chunk}{$i} && $page/($ps{$chunk}/$syspagesize) <= $end{$chunk}{$i}) {
print $objs{$chunk}{$i}.','.$dbs{$chunk};
print "\n";

sub check_params () {

die print "INFORMIXDIR is not set.\n" unless $ENV{'INFORMIXDIR'};
die print "INFORMIXDIR is not set to a valid directory (\'$ENV{'INFORMIXDIR'}\').\n" unless -d $ENV{'INFORMIXDIR'};
die print "INFORMIXSERVER is not set.\n" unless $ENV{'INFORMIXSERVER'};

if ($ARGV[0] && $ARGV[0] eq '-k') {
if ($ARGV[1] && $ARGV[1] !~ /\D/) {
return ($ARGV[1]);
else {
print "Invalid key number: $ARGV[1]\n";
exit 1;
elsif ($ARGV[0]) {
print "Invalid parameter: $ARGV[1]\n";
exit 1;
return (1);

You can then run as follows: onstat -X | chunkpg -k 4

If you were to run onstat -X repeatedly, perhaps at different times of day, you would begin to build a picture of where buffer waits are occurring.

Having identified buffer waits, what can be done to reduce them?

Ultimately it is going to come back to your database design, including its storage schema, the queries you run and maybe onconfig parameters.

Buffer waits on tables can be reduced using partitioning. Fragmentation by round-robin is effective for improving the rate of concurrent inserts (although it doesn’t facilitate fragment elimination in queries). Very large tables may require some form of partitioning to avoid reaching the 16.7m (2^24) page limit but even smaller tables with a large number of inserts and/or deletes can benefit.

It’s worth noting here that buffer waits occur in server RAM and so it’s not obvious at all that the storage schema should come into it. After it only directly affects what happens on disk, right? Not so: I can show that different storage schemas perform differently by running a concurrent insert test on a machine tuned to only flush data to disk at checkpoints. If a table has 10-way round-robin partitioning then it has ten different part numbers and is for many purposes ten different tables. For inserts we are always appending to the end of the table and so there will be contention on the last buffered page between sessions. Spreading this across ten different partitions reduces it.

Indices can also see a large number of buffer waits. In one stark example I found a large number of buffer waits were occurring on a large single-column index where every value in the column was null (nulls are indexed in Informix, unlike some other RDBMSs) and so there was a single leaf node pointing to all rows. Every insert and delete needed to modify this. A less extreme version of this might be seen where there are a limited number of values a field can take. In this case dropping the index or combining the index column with another to vastly increase the number of nodes would help throughput.

Would index fragmentation/partitioning help? It is probably less useful, in general because there is no 16.7m page limit for indices since version 11.70 and, in the case of fragmentation by expression, the same nodes could be equally congested.

In conclusion I think this method is a useful addition to your performance tuning armoury and by reducing contention you can improve the scalability of your system and increase throughput.