Monitoring virtual segment usage and the CPU VP caches

A little while ago I was forced to look into detail into the memory usage of one of the production instances I look after. Specifically the problem was that the instance was allocating extra virtual segments to itself (via the SHMADD process) but it was a surprise because memory usage on this instance was being monitored and it clearly showed that memory usage was normal and well below the initial segment size.

Well, almost everything did. onstat -g seg was correctly reporting the memory usage. Without wishing to put in an early spoiler, this is the only way of seeing how much memory your system is using and how close you are to your instance allocating an extra segment.

With the help of a careful analysis by IBM Informix support, we looked into memory usage on our system using:

onstat -u
onstat -g mem
onstat -g mgm
onstat -g ses

We also drilled down into individual sessions using:

onstat -g afr
onstat -g ffr
onstat -g ufr

We also looked into memory usage by onbar and SQLTRACE.

The result was a massive discrepancy between the total memory in use, as reported by these tools, and what onstat -g seg was reporting. And onstat -g seg appeared to be right because when it said the memory was all used up, the engine would allocate another segment.

So where was the memory going? Was it a leak or a bug? Well no, it was a new feature or, as we later learned, a bug fix.

In response to a performance issue reported by another customer, IBM had redesigned the VP memory cache code in 11.70.FC7W1 and it turned out that this was responsible for the usage of up to 70% of the virtual memory segment on our system. This is an enterprise edition only feature so I guess if you any other edition, you can stop reading at this point and just note that monitoring the output from onstat -g seg is a great idea if you want to predict segment additions.

The CPU memory cache allocates a dedicated area of the virtual memory segment to each CPU virtual processor, which it can use exclusively without contending with other virtual processors.

So a new and undocumented feature in a W1 release? Isn’t this a bit irregular? Well, no said IBM, it was a fix for a customer issue. But it does change the behaviour from what is documented quite dramatically.

In 11.70.FC7 and before, the CPU memory cache size is controlled by the VP_MEMORY_CACHE_KB parameter and if you have, say 8 CPU VPs, this results in a fixed area of memory of 8x VP_MEMORY_CACHE_KB being allocated to CPU memory caches and this is still how the manual says it works.

In 11.70.FC7W1 this parameter merely controls the initial size of these caches, which are then free to grow (and I think shrink below the initial size) as they see fit. To improve performance memory can be allocated to these caches without having to free any first and a separate thread deals with garbage collection (or drainage). (I hope I have explained this properly as I an not a programmer.) What is certain is that if your system is very busy the caches grow faster than the garbage collection clears them down. If your system is very busy for a sustained period, they can grow and allocate memory until you hit SHMTOTAL, if you’ve set it. (I don’t think hitting this limit would be very pretty because the instance would kick out sessions to free up memory, but this is not where the problem lies. Anyway, it would need testing and I haven’t done so.)

So can you monitor it? Yes you can and I’d recommend if you’re running 11.70.FC7W1 or above and have the VP cache switched on that you do. This little code snippet does the job of calculating the total size in Mb of all the VP caches for an instance:

#!/bin/bash
vpcacheblks=0
vpcache=(`onstat -g vpcache | grep -E '^    [0-9]' | awk '{ print $2 }'`)
for ((i=0; i<${#vpcache[*]}; i++)); do
    vpcacheblks=`expr $vpcacheblks + ${vpcache[$i]}`
done
vpcachemb=`echo 'scale=1; '${vpcacheblks}' / 256' | bc`
echo $vpcachemb

You can also use the output from onstat -g vpcache to work out at the number of missed drains using the formula (free - alloc) - drains.

If you have a busy system and particularly one with heavy peak periods, graphing the size of this over time is very interesting. Equally if your system is not that busy, you may see flat memory usage. It’s worth knowing which applies to you.

So if you’re reading this article and, having done a bit of investigation on your own system to see whether it affects you and found that it does, what can you do to mitigate? Here are some options:

  • Downgrade to 11.70.FC7 or earlier.
  • Set VP_MEMORY_CACHE_KB to 0. You can actually do this dynamically using onmode -wm to clear out the cache and then reset it to its original value immediately afterwards.
  • Increase SHMVIRTSIZE to accommodate the growth. Of course you need spare memory in your server to do this.
  • Set SHMTOTAL to remove the possibility of your server swapping. If you do, also look at setting up the low memory manager.

So what are IBM doing about the situation? An APAR has been raised as follows:

IC95684 AFTER THE FIX FOR IC90645 (11.70.FC7W1 AND NEWER) THE VP PRIVATE CACHES CAN GROW UNCONTROLLABLY ON BUSY SYSTEMS

This should result (in 11.70.FC8) in the documentation being updated and a choice of STATIC or DYNAMIC modes will be available for the VP caches. DYNAMIC will be the same as the new behaviour and STATIC is more similar to how things were previously where the VP caches were a fixed size. Note I said more similar and not the same. It will be interesting to look at how this behaves when it’s available.

There’s also another issue touched on here and I’ve used the new request for enhancement (RFE) site to log it and that is that onstat -g mem does not include the VP cache sizes in its output and is therefore not a complete view of all the memory pools in your instance. The RFE requests that it is.

Advertisements

10 Comments on “Monitoring virtual segment usage and the CPU VP caches”

  1. Ben Thompson says:

    IBM have posted a support article about the new ‘memory’ thread online: http://www.ibm.com/support/docview.wss?uid=swg21657248

  2. tgirsch says:

    From the IDS 11.70.FC8 release notes:

    Dynamic private memory caches for CPU virtual processors

    Private memory caches for CPU virtual processors now change size automatically as needed. You create private memory caches by setting the VP_MEMORY_CACHE_KB configuration parameter to the initial total size of the caches. The size of a private memory cache increases and decreases automatically, depending on the needs of the associated CPU virtual processor. Previously, the size of private memory caches was limited to the value of the VP_MEMORY_CACHE_KB configuration parameter. You can preserve the previous behavior by including a comma and the word STATIC after the size value of the VP_MEMORY_CACHE_KB configuration parameter.

    The onstat -g vpcache command now shows the target size for each bin in the cache before draining starts and the last time that each bin was drained.

    Note that the release notes no longer ship with the engine for some reason. The machine notes are in $INFORMIXDIR/release/en_us/0333 (or whatever your localization is) but the release notes are not. I had to go on-line to find them. Here, by the way:

    http://pic.dhe.ibm.com/infocenter/idshelp/v117/topic/com.ibm.po.doc/new_features.htm

    • Ben Thompson says:

      Hi tgirsch,

      Just to be clear: there are three versions of the CPU VP cache around:

      11.70.FC7 and older: original behaviour where memory usage is fixed.
      11.70.FC7W1, 11.70.FC7W2, 11.70.FC7W3: these have what I’ve described above with potentially better performance but significantly more memory usage on busy systems.
      11.70.FC8: choice of STATIC or DYNAMIC modes.

      IBM only do release notes for fix packs, not PID drops so the changed behaviour in W1, W2 and W3 is not documented anywhere. I felt this was worthy of a blog post because you may not be expecting the effects of the redesigned cache when using these versions. If you find your virtual segment is being gobbled it’s not obvious that the CPU memory cache may be the culprit, particularly when the documentation prior to FC8 categorically states that it’s a fixed size.

      The new features doc you’ve linked to describes the FC8 behaviour. The choice of STATIC/DYNAMIC modes means that you’re trading memory usage against performance a little on busy systems.

      Ben.

      • tgirsch says:

        IBM used to ALWAYS include release notes with the product. They came with the full 11.70.FC7 bundle, and with every previous version. They also came with the FC7W1 and FC7W2 releases, though I may have downloaded those as fixpacks rather than full releases.

        Also, I was unclear in my message above. Your OP claimed that the fix should go in with IDS 11.70.FC8. The purpose of my message was to confirm that in fact, it did go in.

      • Ben Thompson says:

        Thanks tgirsch.

  3. tgirsch says:

    Unfortunately, starting with IDS11.70.FC7W2 (and 12.10.FC2), IBM broke the to_char function, so we’re waiting for a patch to IDS 11.70.FC8 before we can upgrade beyond IDS 11.70.FC7W1.

    Details:

    consider the following two SQL statements:

    SELECT to_date(“20131103T031752.147″,”%Y%m%dT%H%M%S%F3”) AS dt
    FROM systables
    WHERE tabid = 1;

    SELECT to_date(“20131103T031752.147″,”%Y%m%dT%H%M%S.%F3”) AS dt
    FROM systables
    WHERE tabid = 1;

    The first statement works on 11.70.FC7W1/12.10.FC1 and earlier, but gives error -1277 on later versions.
    The second statement works on 11.70.FC7W2/12.10.FC2 and later, but gives error -1271 on earlier versions.

  4. tgirsch says:

    I just tested in IDS11.70.FC7W2, and the listed environmental variable makes no difference.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s