RSS, delayed apply and log staging directory

After a couple of RSS troubles this week, I thought I’d do a quick post to cover a few points which are not covered in the IBM documentation.

Using RSS with delayed apply brings into play the log staging directory, controlled by the onconfig parameter LOG_STAGING_DIR. When RSS is running normally you will only see enough logs in here to support the apply delay you’ve set. IBM don’t appear to make a recommendation on how much space to allow for this directory and you might consider it reasonable to work out how many logs you’ll get through during your busiest time and add on a safety margin. My recommendation is that it needs to be large enough to accommodate a full set of online logical logs and furthermore it should reside on a separate file system to ensure this space is always available.

The process of pairing up an RSS secondary with the primary server in your cluster takes place only after you’ve backed up your primary and restored it on the RSS secondary so you might have gone through quite a few logical logs in this time. You may also end up with a large gap if your network link has failed, for example. The use of the staging directly decouples the shipping of the logs from the primary to the RSS server and their application. So once the primary starts shipping logs to the RSS server, because of the log staging directory, it can send them as fast as your network link will allow and just write them to the staging directory unhindered by the apply process. This could be much faster than the apply process and the staging directory could rapidly fill up. If you run out of space in the log staging directory this will be reported on the RSS server:

ERROR:log staging aborted due to IO error (errno:28)
No space left on device

To fix this I just let the RSS server run and apply and delete all the logs in the staging directory, deleting them as it goes, after which it will stop applying. Then I restarted simply it.

A little gremlin I have found is that you may hit a small problem if you use RSS and have implemented role-separation. Role-separation allows you to change the group of the $INFORMIXDIR/etc directory to a group of which your DBAs are a member.

The manual states:

After the installation is complete, INF_ROLE_SEP has no effect. You can establish role separation manually by changing the group that owns the aaodir, dbssodir, or etc directories. You can disable role separation by resetting the group that owns these directories to informix. You can have role separation enabled for the AAO without having role separation enabled for the DBSSO.

Role separation control is through the following group memberships:

  • Users who can perform the DBSA role are group members of the group that owns the directory $INFORMIXDIR/etc.
  • Users who can perform the DBSSO role are group members of the group that owns the $INFORMIXDIR/dbssodir directory.
  • Users who can perform the AAO role are group members of the group that owns the $INFORMIXDIR/aaodir directory.

Note: For each of the groups, the default group is the group informix.

And the RSS documentation states:

The directory specified by the LOG_STAGING_DIR configuration parameter must be secure. The directory must be owned by user informix, must belong to group informix, and must not have public read, write, or execute permission.

So when you set up RSS for the first time you won’t be surprised to find that a subfolder is created under the folder specified as LOG_STAGING_DIR in your onconfig file with group informix.

All well and good but when you restart the server you then see the message like:

Secondary Delay or Stop Apply: The log staging directory () is not secure.

The directory will be as specified in the manual so this message will be unexpected. Manually altering the group on the log staging directory to the group that owns $INFORMIXDIR/etc and restarting the RSS server fixes the problem.


Converting fragment by expression to fragment by range

I really like the new fragment by range method of partitioning new in 11.70 and have blogged about it before, but are IBM missing a trick somewhere?

A common scenario is a table fragmented by expression on a date or datetime column, say with one partition per month, that’s come across from your 11.50 or earlier database instance. This sort of thing sounds ideal for the new fragment by range feature as it automates the adding of a new partition each month meaning that you don’t have to do it yourself or via an application, or allocate loads of partition up front wasting valuable space.

So can we easily convert a fragment by expression table to a fragment by range table? By this I mean without row movement. Well, it seems not although I welcome anyone showing me how it should be done. Take this relatively simple index-less table:

create table conversion_test (
test_id serial not null,
ttimestamp datetime year to second not null
) fragment by expression
partition prehistory (ttimestamp < '2012-10-01 00:00:00') in dbspace1,
partition oct_2012 (ttimestamp < '2012-11-01 00:00:00') in dbspace2,
partition nov_2012 (ttimestamp < '2012-12-01 00:00:00') in dbspace3
extent size 1024 next size 256 lock mode row;

Before we go any further it’s worth noting that I’ve chosen my fragmentation strategy fairly carefully to be compatible with the fragment by range syntax which doesn’t seem to accept operators other than ‘<' for the initial fragments, something the manual doesn't state.

Insert some data:

insert into conversion_test (ttimestamp) values ('2012-09-01 00:00:01');
insert into conversion_test (ttimestamp) values ('2012-09-02 00:00:01');
insert into conversion_test (ttimestamp) values ('2012-10-01 00:00:01');
insert into conversion_test (ttimestamp) values ('2012-10-02 00:00:01');
insert into conversion_test (ttimestamp) values ('2012-11-01 00:00:01');
insert into conversion_test (ttimestamp) values ('2012-11-02 00:00:01');

Let’s try and convert this to fragment by range:

alter fragment on table conversion_test init
fragment by range (ttimestamp) interval (1 units month) store in (dbspace4)
partition prehistory (ttimestamp < '2012-10-01 00:00:00') in dbspace1,
partition oct_2012 (ttimestamp < '2012-11-01 00:00:00') in dbspace2,
partition nov_2012 (ttimestamp < '2012-12-01 00:00:00') in dbspace3;

When running this I am hoping that:

  • The data do not move and the partitions remain as they are.
  • The only thing that changes is that if I now insert data for December 2012, a new partition is automatically created.

Does it work? Well, it runs. Unfortunately though, row movement occurred which we can see from the logical log:

fffff018 56 BEGIN 26 10 0 11/05/2012 10:46:15 92 informix
fffff050 388 BLDCL 26 0 4018 100274 512 128 12 0 conversion_test
fffff1d4 52 CHALLOC 26 0 4050 00001:0000496520 200
fffff208 56 PTEXTEND 26 0 41d4 100274 511 00001:0000496520
fffff240 388 BLDCL 26 0 4208 100275 512 128 12 0 conversion_test
fffff3c4 52 CHALLOC 26 0 4240 00001:0000497032 200
fffff3f8 56 PTEXTEND 26 0 43c4 100275 511 00001:0000497032
fffff430 388 BLDCL 26 0 43f8 100276 512 128 12 0 conversion_test
fffff5b4 52 CHALLOC 26 0 4430 00001:0000497544 200
fffff5e8 56 PTEXTEND 26 0 45b4 100276 511 00001:0000497544
...
fffff6b8 76 HINSERT 26 0 4688 100274 101 12
fffff704 76 HINSERT 26 0 46b8 100274 102 12
fffff750 76 HINSERT 26 0 4704 100275 101 12
fffff79c 76 HINSERT 26 0 4750 100275 102 12
fffff7e8 76 HINSERT 26 0 479c 100276 101 12
fffff050 76 HINSERT 26 0 47e8 100276 102 12

I don’t like that as on a big table this would be a long logged operation so I am going to cheat and show that this simple table can be switched to fragment by range by hacking around in systables (don’t do this at home and definitely nowhere near a production system).

By cloning my database and performing the change to fragment by range I can see what the entries in the sysfragments table look before and after and unload them to files.

unload to conversion_test_sysfragments.unl
select * from sysfragments where tabid=(select tabid from systables where tabname='conversion_test');

I get three rows from before and six rows from after.

I now need to fiddle the after unload file a bit from after the alter and put the values for partn and exprbin from before the alter into the three original rows as Informix changed these when it re-initialised the table and in my cheat method the part numbers will not change. I then revert to my original database which still has the table fragmented by expression and load in the new sysfragments entries:

delete from sysfragments where tabid=109;

3 row(s) deleted.

load from conversion_test_sysfragments.unl insert into sysfragments;

6 row(s) loaded.

Dbschema shows the result immediately and it seems I don’t even have to bounce the engine:

{ TABLE "informix".conversion_test row size = 12 number of columns = 2 index size = 0 }

create table "informix".conversion_test
(
test_id serial not null ,
ttimestamp datetime year to second not null
)
fragment by range(ttimestamp) interval(interval( 1) month(9) to month) store in(dbspace4)
partition prehistory VALUES < datetime(2012-10-01 00:00:00) year to second in dbspace1,
partition oct_2012 VALUES < datetime(2012-11-01 00:00:00) year to second in dbspace2,
partition nov_2012 VALUES < datetime(2012-12-01 00:00:00) year to second in dbspace3
extent size 1024 next size 256 lock mode row;

And everything still appears to work in that I can select data and insert new rows, including rows for December 2012 and beyond where a new partition is automatically created.

So what does this post prove? It shows in principle that it’s possible to covert a table fragmented by expression with no indices to one with an identical initial fragmentation strategy fragmented by range, at least where the original strategy is compatible with the range partitioning. I don’t see in principle why this would not also work with indices in place, although I have not tested it. It seems like a nice feature to put into Informix at relatively low effort. One for V.next?


Getting the most out of an Innovator-C instance

IBM Informix Innovator-C edition provides a free version of the Informix engine, albeit with memory, CPU and feature restrictions. IBM describes it as useful for development, test and small production use, which is a fair summary. However, even for a test system you may find Innovator-C edition restrictive. To make things worse, with Informix 11.70 IBM reduced the number of CPU virtual processors (VPs) allowed from four to one and withdrew the 11.50 version, with its more generous limit. It also switched out the HDR/ER functionality. I suspect this was done because Innovator-C edition was abstracting revenue from paid-for editions.

This blog post is about getting the most out of an Innovator-C instance. The most obvious point is that if you still have an 11.50 download laying around you can still use it and configure it for up to four CPU VPs, unless you need 11.70 features of course.

Let’s take a look at what we can do with onconfig.

As you are restricted to a single CPU VP with 11.70, you’ll need to set the system up for this:

MULTIPROCESSOR 1
VPCLASS cpu,num=1,noage
VP_MEMORY_CACHE_KB 0
SINGLE_CPU_VP 1

I have set MULTIPROCESSOR to 1 because I am using a multi-processor machine. Unfortunately VP_MEMORY_CACHE is another feature you can’t use with Innovator-C edition so I have set that to zero too. SINGLE_CPU_VP is set to 1 as it switches out some checks the engine performs when there are more than one CPU VPs. You may as well set this even though the performance difference is small because setting VPCLASS to anything other than num=1 or dynamically adding CPU VPs is disallowed.

The 2 Gb memory limit can prevent the engine adding extra memory segments as required. This is a serious problem if your application is greedy with memory. Keeping cursors open or having an excessive number of connections to the database are common causes of this; fixing the root cause is the best approach but not something a DBA can directly control. You’ll therefore have to make sure that your engine has enough shared memory at start-up or you keep the BUFFERPOOLs small so that there is free memory under the 2 Gb limit to allow adding segments to work. If things are really tight, you will need to configure minimal BUFFERPOOLs and assign every last kilobyte to the virtual segment. Furthermore, failing to add a shared memory segment because of a memory limit can sometimes cause an assert failure and possibly a shared memory dump depending on the value of onconfig parameter DUMPSHMEM. You may want to set this to something other than 1 to prevent 2 Gb being written to disk each time.

To improve performance on any single CPU VP instance, you will probably want to avoid using kernel AIO on Linux or Mac OS systems. This might sound wrong as kernel AIO is generally regarded as offering a performance advantage. On a single CPU VP instance, however, you don’t want your only CPU VP to be responsible for I/O, even if it is asynchronous. Instead use normal AIO VPs to do this work (and tuning these appropriately) which run as separate oninit processes and – even though they won’t be doing much processing – they can run concurrently to your only CPU VP and your operating system can run them on any other CPUs installed in your system. As kernel AIO is enabled by default, you have to set the environment variable KAIOOFF to 1 in the shell in which you start your engine. Run onstat -g iov and check you can’t see any kio class VPs.

If you’re performing sort operations, like building an index, Innovator-C edition does not support PDQ and so the MAX_PDQPRIORITY onconfig parameter is forced to zero. However you can still set the PSORT_NPROCS and PSORT_DBTEMP environment variables in your session. Those PSORT threads will all have to run one the single CPU VP but you may see a performance benefit. If your server has significantly more than the 2 Gb RAM you’re restricted to using, you could set PSORT_DBTEMP to point to a RAM disk for a significant performance boost.

One final point is that I find using Innovator-C edition unsatisfactory for test systems and you might too if your production environment is licensed using growth or ultimate edition. Your production environment may be using 11.70.FC5W1 but no such version exists for Innovator-C edition; you’d be stuck with 11.70.FC5IE: mismatched versions may or may not be good enough for your testing. Also, having a production sized test environment can be made difficult using Innovator-C edition which, while it ostensibly has no storage limits, does not support partitioniong and therefore table and index sizes are restricted to 256 Gb: 16 million pages times the largest page size available, 16 kb. Importing/exporting or cloning from an instance with partitioning is tricky and impossible respectively. You may want to consider buying growth or ultimate edition on a named user basis, which can be quite cost-effective.