Partitioning by range

Partitioning or fragmenting large tables is a great Informix feature. The ability to fragment by an arbitrary SQL expression or round robin provides more flexibility than is found in many competing RDBMSs.

One thing it never catered for that well though was large log tables that include a timestamp of some description. Most database schemas I have seen have at least one of these: it is usually some kind of application or audit log; it’s generally one of the largest tables in your schema; and after a while there will probably be a requirement to purge the oldest entries from it.

Prior to version 11.70, for these tables – if you wanted to employ fragmentation – you had several options:

Fragment by round-robin
This was the easiest option and meant you could spread this large table over several dbspaces (and therefore disks or controllers) and also meant you could make good use of parallel data queries when you needed to.
Fragment by expression using specific date ranges
This was a good option in that you could take the benefits of round-robin fragmentation and add in the possibility of fragment elimination on suitable queries. (This is where the Informix optimiser knows that a certain table fragments won’t contain any data your query is looking for because the table storage schema says so.) Additionally if any indices on this table, including those used to enforce constraints, followed the same storage schema, purging could be achieved by dropping the oldest fragment instead of thrashing away deleting data and taxing your logical logs and the BTree scanner. The major drawback of this method is that if you weren’t careful you could run out of fragments since you needed to allocate the fragments you needed and the disk space for them ahead of time.
Fragment by expression with a function
One way of avoiding the drawback of needing to allocate fragments was to use some sort of storage schema like:
partition by expression
partition january (month(timestamp) = 1) in dbspace1,
partition february (month(timestamp) = 2) in dbspace2,
...

This wasn’t particularly satisfactory either as you missed out on fragment elimination because the optimiser wasn’t clever enough to realise that your data for February 2011 existed in the February partition unless your SQL statement included the condition month(timestamp) = 1; something like timestamp = ‘2011-02-28 09:46:00’ wouldn’t do. It was also useless from a purging perspective unless you wanted to purge everything for a particular month, regardless of the year.

Version 11.70 gives us the best of all these worlds through the new partition by range option. We can create an example log table using this SQL:

drop table if exists log_table;

create table log_table (
id serial not null,
timestamp datetime year to second not null,
loginfo varchar(254)
)
partition by range (timestamp) interval (1 units month) store in (maindbs)
partition initial_log_table values < (cast('2011-01-01 00:00:00' as datetime year to second)) in maindbs
extent size 8192 next size 2048 lock mode row;

alter table log_table add constraint primary key (id) constraint pk_log_table;
create index ix_timestamp on log_table (timestamp);

I’ve just opted to put all my fragments in maindbs for simplicity but a comma-separated list of dbspaces can be supplied in the store in clause, which will be then be used in a round-robin fashion. Because I’ve not given the index ix_timestamp its own storage clause, it will be stored in the same way as the table, which we’ll see below. I’ve also used the new if {not} exists clause available in 11.70 which is a very welcome addition.

We need to be careful to avoid the error 25803: Interval or list fragmentation must have at least one non-null, non-remainder fragment which can occur if we don’t include the initial fragment I’ve included above (or something like it).

Let’s insert some data using this procedure (much nicer than hundreds of insert statements):

drop procedure if exists insert_data;
create procedure insert_data ()

define p_timestamp datetime year to second;
define p_logno smallint;

let p_timestamp = cast('2011-01-01 00:00:00' as datetime year to second);
let p_logno = 1;

while p_timestamp < current

insert into log_table (timestamp, loginfo) values (p_timestamp, 'Log entry no. ' || p_logno);

let p_timestamp = p_timestamp + 977 units minute;
let p_logno = p_logno + 1;

end while;

end procedure;

execute procedure insert_data();

By running oncheck -pe we can see that this has resulted in the creation of several partitions:

testdb:'informix'.log_table 1535 4096
testdb:'informix'. 127_75 5631 199
testdb:'informix'.ix_timestamp 5830 208
testdb:'informix'.log_table 6038 4096
testdb:'informix'.ix_timestamp 10134 208
testdb:'informix'.log_table 10342 4096
testdb:'informix'.ix_timestamp 14438 208
testdb:'informix'.log_table 14646 4096
testdb:'informix'.ix_timestamp 18742 208
testdb:'informix'.log_table 18950 4096
testdb:'informix'.ix_timestamp 23046 208
testdb:'informix'.log_table 23254 4096
testdb:'informix'.ix_timestamp 27350 208
testdb:'informix'.log_table 27558 4096
testdb:'informix'.ix_timestamp 31654 208
testdb:'informix'.sysfragments 31862 8
testdb:'informix'.log_table 31870 4096
testdb:'informix'.ix_timestamp 35966 208
testdb:'informix'.log_table 36174 4096
testdb:'informix'.ix_timestamp 40270 208
testdb:'informix'.log_table 40478 4096
testdb:'informix'.ix_timestamp 44574 208
testdb:'informix'.log_table 44782 4096
testdb:'informix'.ix_timestamp 48878 208
testdb:'informix'.log_table 49086 4096
testdb:'informix'.ix_timestamp 53182 208

The above shows that we have a partition for each month and also an index partition for each month.

We can get more information by querying the sysfragments table:

unload to 'fragments.txt' delimiter '|'
select fragtype, index name, partition, exprtext
from sys fragments
where tabid=127;

The tab id for our table is 127 – we can get this by querying systables. This gives us:

T|||timestamp|
T|||interval( 1) month(9) to month|
T|||maindbs|
T||initial_log_table|VALUES < datetime(2011-01-01 00:00:00) year to second|
I| 127_75|maindbs||
I|ix_timestamp||timestamp|
I|ix_timestamp||interval( 1) month(9) to month|
I|ix_timestamp||maindbs|
I|ix_timestamp|initial_log_table|VALUES < datetime(2011-01-01 00:00:00) year to second|
T||sys_p1|VALUES >= datetime(2011-01-01 00:00:00) year to second AND VALUES < datetime(2011-02-01 00:00:00) year to second|
I|ix_timestamp|sys_p1|VALUES >= datetime(2011-01-01 00:00:00) year to second AND VALUES < datetime(2011-02-01 00:00:00) year to second|
T||sys_p2|VALUES >= datetime(2011-02-01 00:00:00) year to second AND VALUES < datetime(2011-03-01 00:00:00) year to second|
I|ix_timestamp|sys_p2|VALUES >= datetime(2011-02-01 00:00:00) year to second AND VALUES < datetime(2011-03-01 00:00:00) year to second|
T||sys_p3|VALUES >= datetime(2011-03-01 00:00:00) year to second AND VALUES < datetime(2011-04-01 00:00:00) year to second|
I|ix_timestamp|sys_p3|VALUES >= datetime(2011-03-01 00:00:00) year to second AND VALUES < datetime(2011-04-01 00:00:00) year to second|
T||sys_p4|VALUES >= datetime(2011-04-01 00:00:00) year to second AND VALUES < datetime(2011-05-01 00:00:00) year to second|
I|ix_timestamp|sys_p4|VALUES >= datetime(2011-04-01 00:00:00) year to second AND VALUES < datetime(2011-05-01 00:00:00) year to second|
T||sys_p5|VALUES >= datetime(2011-05-01 00:00:00) year to second AND VALUES < datetime(2011-06-01 00:00:00) year to second|
I|ix_timestamp|sys_p5|VALUES >= datetime(2011-05-01 00:00:00) year to second AND VALUES < datetime(2011-06-01 00:00:00) year to second|
T||sys_p6|VALUES >= datetime(2011-06-01 00:00:00) year to second AND VALUES < datetime(2011-07-01 00:00:00) year to second|
I|ix_timestamp|sys_p6|VALUES >= datetime(2011-06-01 00:00:00) year to second AND VALUES < datetime(2011-07-01 00:00:00) year to second|
T||sys_p7|VALUES >= datetime(2011-07-01 00:00:00) year to second AND VALUES < datetime(2011-08-01 00:00:00) year to second|
I|ix_timestamp|sys_p7|VALUES >= datetime(2011-07-01 00:00:00) year to second AND VALUES < datetime(2011-08-01 00:00:00) year to second|
T||sys_p8|VALUES >= datetime(2011-08-01 00:00:00) year to second AND VALUES < datetime(2011-09-01 00:00:00) year to second|
I|ix_timestamp|sys_p8|VALUES >= datetime(2011-08-01 00:00:00) year to second AND VALUES < datetime(2011-09-01 00:00:00) year to second|
T||sys_p9|VALUES >= datetime(2011-09-01 00:00:00) year to second AND VALUES < datetime(2011-10-01 00:00:00) year to second|
I|ix_timestamp|sys_p9|VALUES >= datetime(2011-09-01 00:00:00) year to second AND VALUES < datetime(2011-10-01 00:00:00) year to second|
T||sys_p10|VALUES >= datetime(2011-10-01 00:00:00) year to second AND VALUES < datetime(2011-11-01 00:00:00) year to second|
I|ix_timestamp|sys_p10|VALUES >= datetime(2011-10-01 00:00:00) year to second AND VALUES < datetime(2011-11-01 00:00:00) year to second|
T||sys_p11|VALUES >= datetime(2011-11-01 00:00:00) year to second AND VALUES < datetime(2011-12-01 00:00:00) year to second|
I|ix_timestamp|sys_p11|VALUES >= datetime(2011-11-01 00:00:00) year to second AND VALUES < datetime(2011-12-01 00:00:00) year to second|

We can see from the bottom entries that Informix has created some partition names for us, since it is managing the partitions themselves for us. It also shows us an equivalent “fragment by expression” syntax which it’s automatically generated based on the interval we specified.

So can we drop partitions we don’t need without having to delete all the data? Yes, we can: detach as new table old_logs and then drop our unwanted partition for January 2011 as follows:

alter fragment online on table log_table detach partition sys_p1 old_logs;
drop table old_logs;

Note that the above is not an online operation and will require a table lock to work.

Does fragment elimination work? Yes, as we’re now just using an automatically managed version of fragment by expression, where fragment elimination has always worked as this shows:

set explain on;
select * from log_table
where timestamp between '2011-08-01 00:00:00' and '2011-08-02 00:00:00';

Looking at the query plan we have:

QUERY: (OPTIMIZATION TIMESTAMP: 11-23-2011 21:01:58)
------
select * from log_table where timestamp between '2011-08-01 00:00:00' and '2011-08-02 00:00:00'

Estimated Cost: 2
Estimated # of Rows Returned: 49

1) informix.log_table: INDEX PATH

(1) Index Name: informix.ix_timestamp
Index Keys: timestamp (Serial, fragments: 8 )
Fragments Scanned: (8) sys_p8 in maindbs
Lower Index Filter: informix.log_table.timestamp >= datetime(2011-08-01 00:00:00) year to second
Upper Index Filter: informix.log_table.timestamp <= datetime(2011-08-02 00:00:00) year to second

The above shows that the engine only looked in fragment no. 8.

So all in all, an excellent feature that allows us to:

  • Fragment our data by date range or another interval without having to worry about running out of partitions for future data.
  • Control where these fragments go using a single dbspace or a round-robin approach.
  • Achieve fragment elimination with suitable queries (reduce contention, improve performance).
  • Solves the problem of purging data using delete from statements which can thrash the logical logs and BTree scanner.
  • Provides for parallel data query.
Advertisements

TCP poll listeners: run as a NET or CPU virtual process?

At the recent Avnet Informix Community Event, Jon Ritson did an excellent presentation about hot topics including performance tuning. Many of the tips were not new but one that got me thinking was his suggestion that TCP poll threads should run on a network virtual processor (VP) rather than a CPU one. This seemed to go against previous advice I’ve read elsewhere.

Jon went on to say that running the poll thread in this way allowed them to be scheduled in a way that took up fewer resources than if they ran on a CPU VP. In the latter case we would expect the response time would be marginally better but it would use more processing power for not much gain. At least, I think that’s what he said.

The job of TCP poll threads is to listen for new data on existing connections and notify other threads. I think what’s being suggested here is that if it takes a few clock cycles longer for the database engine to detect that your SQL statement is waiting, it might not matter too much

So what to do? Refer to the IBM documentation and do some experimenting! One thing I noticed is that the 11.50 documentation is slightly more clear about it than the version 10.00 docs were:

Poll threads can run either on CPU virtual processors or on network virtual processors. In general, and particularly on a single-processor computer, poll threads run more efficiently on CPU virtual processors. This might not be true, however, on a multiprocessor computer with many remote clients.

And under a note labelled important (which was only labelled note in the 10.00 documentation) it adds:

TCP connections must only be in network virtual processors, and you must only have the minimum required to maintain responsiveness.

There’s a second statement there: have only the minimum required to maintain responsiveness. This again seems to be suggesting they use up resources.

You can monitor thread CPU usage with onstat -g cpu. Compare the output below: the first is from an idle instance running for one minute with the poll thread running on a NET VP; the second has it running on a CPU VP. There are 222 wake ups against 1310. I guess 222 wake ups is enough to “maintain responsiveness”.

Net VP:

informix@ids1150srvr[demo_on]:/opt/IBM/informix/etc$ onstat -g cpu

IBM Informix Dynamic Server Version 11.50.UC4DE -- On-Line -- Up 00:01:00 -- 22556 Kbytes

Thread CPU Info:
tid name vp Last Run CPU Time #scheds status
2 lio vp 0 6lio* 11/13 10:57:24 0.0180 14 IO Idle
3 pio vp 0 7pio* 11/13 10:57:22 0.1270 3 IO Idle
4 aio vp 0 8aio* 11/13 10:58:07 0.1504 338 IO Idle
5 msc vp 0 9msc* 11/13 10:57:19 0.1838 5 IO Idle
6 aio vp 1 10aio* 11/13 10:57:24 0.1025 41 IO Idle
7 main_loop() 1cpu 11/13 10:58:13 0.0059 83 sleeping secs: 1
8 soctcppoll 11soc* 11/13 10:58:13 53.9641 222 running
9 soctcplst 1cpu* 11/13 10:57:19 0.0053 6 sleeping forever
10 soctcplst 1cpu* 11/13 10:57:19 0.0053 5 sleeping forever
11 flush_sub(0) 1cpu 11/13 10:58:13 0.0001 56 sleeping secs: 1
12 flush_sub(1) 1cpu 11/13 10:58:13 0.0001 56 sleeping secs: 1
13 flush_sub(2) 1cpu 11/13 10:58:13 0.0001 54 sleeping secs: 1
14 flush_sub(3) 1cpu 11/13 10:58:13 0.0001 54 sleeping secs: 1
15 flush_sub(4) 1cpu 11/13 10:58:13 0.0001 54 sleeping secs: 1
16 flush_sub(5) 1cpu 11/13 10:58:13 0.0001 54 sleeping secs: 1
17 flush_sub(6) 1cpu 11/13 10:58:13 0.0001 54 sleeping secs: 1
18 flush_sub(7) 1cpu 11/13 10:58:13 0.0001 54 sleeping secs: 1
19 aio vp 2 12aio* 11/13 10:57:24 0.1048 4 IO Idle
20 aslogflush 1cpu 11/13 10:58:13 0.0002 53 sleeping secs: 1
21 btscanner_0 1cpu 11/13 10:57:54 0.0077 15 sleeping secs: 1
37 onmode_mon 1cpu* 11/13 10:58:13 0.0003 52 sleeping secs: 1
38 periodic 1cpu 11/13 10:58:13 0.0013 54 sleeping secs: 1
44 dbScheduler 1cpu* 11/13 10:57:24 0.0394 203 sleeping secs: 251
45 dbWorker1 1cpu 11/13 10:57:24 0.0051 73 sleeping forever
46 dbWorker2 1cpu 11/13 10:57:24 0.0547 96 sleeping forever

CPU VP:

informix@ids1150srvr[demo_on]:/opt/IBM/informix/etc$ onstat -g cpu

IBM Informix Dynamic Server Version 11.50.UC4DE -- On-Line -- Up 00:01:00 -- 22556 Kbytes

Thread CPU Info:
tid name vp Last Run CPU Time #scheds status
2 lio vp 0 6lio* 11/13 10:55:36 0.0206 11 IO Idle
3 pio vp 0 7pio* 11/13 10:55:34 0.1220 3 IO Idle
4 aio vp 0 8aio* 11/13 10:56:19 0.2182 172 IO Idle
5 msc vp 0 9msc* 11/13 10:55:32 0.2904 3 IO Idle
6 aio vp 1 10aio* 11/13 10:55:36 0.1028 53 IO Idle
7 main_loop() 1cpu 11/13 10:56:26 0.0068 115 sleeping secs: 1
8 soctcppoll 1cpu* 11/13 10:56:26 54.0073 1310 running
9 soctcplst 1cpu* 11/13 10:55:32 0.0042 7 sleeping forever
10 soctcplst 1cpu* 11/13 10:55:32 0.0038 7 sleeping forever
11 flush_sub(0) 1cpu 11/13 10:56:26 0.0001 57 sleeping secs: 1
12 flush_sub(1) 1cpu 11/13 10:56:26 0.0001 57 sleeping secs: 1
13 flush_sub(2) 1cpu 11/13 10:56:26 0.0001 54 sleeping secs: 1
14 flush_sub(3) 1cpu 11/13 10:56:26 0.0001 54 sleeping secs: 1
15 flush_sub(4) 1cpu 11/13 10:56:26 0.0001 54 sleeping secs: 1
16 flush_sub(5) 1cpu 11/13 10:56:26 0.0001 54 sleeping secs: 1
17 flush_sub(6) 1cpu 11/13 10:56:26 0.0001 54 sleeping secs: 1
18 flush_sub(7) 1cpu 11/13 10:56:26 0.0035 55 sleeping secs: 1
19 aio vp 2 11aio* 11/13 10:55:36 0.0964 21 IO Idle
20 aslogflush 1cpu 11/13 10:56:26 0.0002 53 sleeping secs: 1
21 btscanner_0 1cpu 11/13 10:56:25 0.0079 17 sleeping secs: 24
37 onmode_mon 1cpu* 11/13 10:56:26 0.0003 53 sleeping secs: 1
38 periodic 1cpu 11/13 10:56:26 0.0012 55 sleeping secs: 1
44 dbScheduler 1cpu* 11/13 10:55:36 0.0289 305 sleeping secs: 250
45 dbWorker1 1cpu 11/13 10:55:36 0.0066 93 sleeping forever
46 dbWorker2 1cpu 11/13 10:55:36 0.0543 119 sleeping forever

I am not sure that the CPU times above are too important since top showed the system to be more or less idle during both tests.

I haven’t been able to benchmark this in a production system and it’s unlikely I’ll be able to. But I would like to say “thanks for the top tip, Jon”.