RSS, delayed apply and log staging directory

After a couple of RSS troubles this week, I thought I’d do a quick post to cover a few points which are not covered in the IBM documentation.

Using RSS with delayed apply brings into play the log staging directory, controlled by the onconfig parameter LOG_STAGING_DIR. When RSS is running normally you will only see enough logs in here to support the apply delay you’ve set. IBM don’t appear to make a recommendation on how much space to allow for this directory and you might consider it reasonable to work out how many logs you’ll get through during your busiest time and add on a safety margin. My recommendation is that it needs to be large enough to accommodate a full set of online logical logs and furthermore it should reside on a separate file system to ensure this space is always available.

The process of pairing up an RSS secondary with the primary server in your cluster takes place only after you’ve backed up your primary and restored it on the RSS secondary so you might have gone through quite a few logical logs in this time. You may also end up with a large gap if your network link has failed, for example. The use of the staging directly decouples the shipping of the logs from the primary to the RSS server and their application. So once the primary starts shipping logs to the RSS server, because of the log staging directory, it can send them as fast as your network link will allow and just write them to the staging directory unhindered by the apply process. This could be much faster than the apply process and the staging directory could rapidly fill up. If you run out of space in the log staging directory this will be reported on the RSS server:

ERROR:log staging aborted due to IO error (errno:28)
No space left on device

To fix this I just let the RSS server run and apply and delete all the logs in the staging directory, deleting them as it goes, after which it will stop applying. Then I restarted simply it.

A little gremlin I have found is that you may hit a small problem if you use RSS and have implemented role-separation. Role-separation allows you to change the group of the $INFORMIXDIR/etc directory to a group of which your DBAs are a member.

The manual states:

After the installation is complete, INF_ROLE_SEP has no effect. You can establish role separation manually by changing the group that owns the aaodir, dbssodir, or etc directories. You can disable role separation by resetting the group that owns these directories to informix. You can have role separation enabled for the AAO without having role separation enabled for the DBSSO.

Role separation control is through the following group memberships:

  • Users who can perform the DBSA role are group members of the group that owns the directory $INFORMIXDIR/etc.
  • Users who can perform the DBSSO role are group members of the group that owns the $INFORMIXDIR/dbssodir directory.
  • Users who can perform the AAO role are group members of the group that owns the $INFORMIXDIR/aaodir directory.

Note: For each of the groups, the default group is the group informix.

And the RSS documentation states:

The directory specified by the LOG_STAGING_DIR configuration parameter must be secure. The directory must be owned by user informix, must belong to group informix, and must not have public read, write, or execute permission.

So when you set up RSS for the first time you won’t be surprised to find that a subfolder is created under the folder specified as LOG_STAGING_DIR in your onconfig file with group informix.

All well and good but when you restart the server you then see the message like:

Secondary Delay or Stop Apply: The log staging directory () is not secure.

The directory will be as specified in the manual so this message will be unexpected. Manually altering the group on the log staging directory to the group that owns $INFORMIXDIR/etc and restarting the RSS server fixes the problem.