Forest of trees

Posted: 7 September, 2018 | Author: Ben Thompson | Filed under: Instance and operating system tuning, SQL query tuning, Storage schema management | Tags: forest of trees, FOT |2 Comments

Introduction

Forest of trees (FOT) indices were a new feature in 11.70 and are designed to avoid multiple sqlexec threads competing for access to the same index root node (top level of the index) and wasting CPU cycles. When a thread cannot access the root node because it is latched by another thread, it will spin and re-try. This informs the approach taken by the Informix official documentation to finding where a FOT will be beneficial: checking for database objects (that happen to be indices) with the most spin locks.

You can create a FOT like this:
CREATE INDEX fotidx ON tab(c1) hash on (c1) with 100 buckets;

The idea behind FOTs is to split the index into many “buckets”, each with its own root node, improving concurrency by reducing competition between threads. A hash function on a specified column is used to determine in which bucket to place to retrieve or place a row, which in most cases can only practicably be a hash on the leading column of the index. From this it follows that the leading column needs to have a lot of distinct values and not be skewed otherwise the hash function will return the same value frequently negating the potential benefits. Certainly it is pointless to have more buckets than you have distinct values on the hashed column and, unless you want a set of index buckets consisting of a single value, you’ll want the number of buckets to be a fraction of the distinct values since different column values can of course hash to the same bucket.

If you’ve so far ignored this feature do you need it? What can be the benefits of implementing a FOT given there are some potential drawbacks such as not being able to do range scans on the hashed column? What other symptoms are there? And is the implementation as good as it could be?

Do I need one?

As the problem FOTs attempt to solve is competition between threads for access to the same index root nodes, it follows that you are unlikely to ever need one on systems with one or only a few CPU virtual processors (CPU VPs). Without multiple CPU VPs it is impossible for multiple sqlexec threads to be accessing the same buffers at the same time. It also follows that you need a fair number of database connections and for these to be doing similar things. By similar I don’t mean exactly the same because the threads would still compete for the same root node. A good example of where a FOT would work well is where multiple sessions select all the data for a customer by customer id.: the same query is being run in many sessions but the customer id. is different. It’s worth mentioning that this contention occurs purely in memory on the database server; and the performance problem the feature addresses is not related to fetching rows from storage.

What are the benefits?

At the instance level, a large number of spin locks manifests itself as high CPU usage without the database server doing much useful work. Of course there can be many causes of high CPU usage such as poor SQL, poor query plans, slow disk etc. but a feature of systems suffering this kind of contention is that they don’t scale linearly. If an index is not very busy the FOT is not giving you any benefits over a standard B-Tree index but as load ramps up and an index is used more intensively by multiple concurrent threads, spin waits with a standard B-Tree can increase dramatically consuming CPU cycles for no real gain.

What other symptoms are there of spin waits caused by index root node contention?

CPU load suddenly jumping up when load hits a certain level.
A large number of buffer waits (‘onstat -X‘) on the index concerned.
Many running sqlexec threads are all running the same SQL statements. You may see these threads in this state:
tid name rstcb flags curstk status 5362439 sqlexec 1f519b358 ---PR-- 11888 running-

Is the implementation as good as it could be?

There are a couple of missing features in the implementation, both around the creation of the index itself rather than accessing or updating it.:

While you can allocate more memory to the session building the index by setting PDQPRIORITY building an FOT is always a single-threaded operation, meaning it can be slow with large tables.
Rebuilding an index as an FOT via “alter fragment on index” is not supported, meaning you can’t rebuild the index and leave the logical schema intact. Instead you must drop and recreate the index. This isn’t too painful if the index is there for performance reasons only but if it’s supporting constraints you need to drop/recreate these too and, if it’s a primary key, any referencing foreign keys will be implicitly dropped. You can find any foreign keys referencing a given table with:
SELECT t.tabname, c.constrname FROM sysconstraints c, systables t WHERE c.tabid=t.tabid AND c.constrid IN ( SELECT constrid FROM sysreferences WHERE ptabid IN ( SELECT tabid FROM systables WHERE tabname='XXXXXX' ) );
Then use dbschema to fetch the SQL to recreate the foreign keys.

If you think these improvements to the implementation would be beneficial to you, there is an IBM RFE you can vote for.

Summary

This is a feature that you may never need and aimed at solving a specific type of performance problem very effectively.

2 Comments on “Forest of trees”

Vercelletto Eric (@EricVercelletto) says:

7 September, 2018 at 16.01

Great post Ben! One more functionality that Informix has for some time and that few people want to use. Definately a great asset when an index has a lot of concurrent accesses!

Reply
- Ben Thompson says:
  
  7 September, 2018 at 16.36
  
  It’s this kind of feature that makes it a proper enterprise DB engine.
  
  Reply

Informed Mix

Random postings about IBM Informix

Forest of trees

Introduction

Do I need one?

What are the benefits?

What other symptoms are there of spin waits caused by index root node contention?

Is the implementation as good as it could be?

Summary

2 Comments on “Forest of trees”

Leave a comment Cancel reply

Recent Posts

Categories

Archives

Tweets by Ben Thompson

Other Informix blogs

Other links

Meta

Informed Mix

Random postings about IBM Informix

Forest of trees

Introduction

Do I need one?

What are the benefits?

What other symptoms are there of spin waits caused by index root node contention?

Is the implementation as good as it could be?

Summary

Share this:

Related

2 Comments on “Forest of trees”

Leave a comment Cancel reply

Recent Posts

Categories

Archives

Tweets by Ben Thompson

Other Informix blogs

Other links

Meta