RSS_FLOW_CONTROLPosted: 15 July, 2021
Most of the time I like my blog to be positive or at least mostly positive with a few caveats. This post is entirely caveats.
RSS_FLOW_CONTROL, please permit to vent about this.
Man, do I dislike this feature? Where to start?
Perhaps with what it does. This feature is meant to ensure that the last acknowledged log received by an RSS server does not fall too far behind the primary’s current log position in an Informix cluster.
It sounds innocuous enough but how is this achieved? Given your RSS server is always working as fast as it can and its network link with your primary is what it is, it can only be done by stopping the primary server from writing temporarily until the RSS catches up. Many applications won’t respond well to this.
Someone somewhere probably had a use case for this along the lines of how much data loss they could tolerate in the event of losing their primary site, but is this and how it is implemented what most users want: probably not. It’s on by default, by which I mean in the onconfig.std shipped with the product it is enabled. Not only is it on but the default values of activating when the difference is just 12x the log buffer size (default 64 kB) and deactivating when this drops to 11x can see it operating the moment your network hiccups.
The onconfig.std ships with the value ‘0’ which, at a casual glance, suggests it should be off but this actually represents ‘on’ and with the default values above, which are quite aggressive. It is ‘-1’ which turns it off, to be fair there are comments in the file making this clear. Except in 12.10.xC14 where there is a bug which means it is on even when set to ‘-1’ and with those aggressive defaults.
How do you know when it’s operating? You may capture a lot of user threads waiting on the logical log buffer (state ‘G’), but this could be as a result of a myriad of other issues. However the only way to know for sure is to catch it in the act with onstat -g rss verbose and compare:
RSS flow control:3072/2816 ... Approximate Log Page Backlog:3232
As 3232 > 3072 it is working and it won’t stop until the backlog drops below 2816. There is no simple ‘disabled/off/on’ status.
Most of what I have written is the manual but what isn’t is how this can be on when you never consciously enabled it, and how it’s impossible after the event to ever tell it operated unless your diagnostics captured that one onstat at the right time. Maybe reading this post might help someone with a future investigation.
Given where we are now the following would be useful usability enhancements in my opinion:
- ship onconfig.std with RSS_FLOW_CONTROL -1
- print a line in the message log when this feature operates (may need to be rate limited).
If you are waiting for my post on large table operations it is coming, venting is just easier.