Large parallel index builds and temp space

This is a quick post about parallel index builds. Today I was building with PDQPRIORITY a unfragmented detached index on a large table fragmented by range with ten large fragments and I saw this message in the online log:

10:28:53 WARNING: Not enough temp space for parallel index build.
Space required = 110566014 pages; space available = 8385216 pages.
Partial index build started.

You can see I am quite a long way short of the temp space required here; I need just over thirteen times more.

In this instance I have eight temporary dbspaces available and all are listed in the DBSPACETEMP onconfig parameter and I have no local override. They are all 2 Gb and using a 16 kb page size so have 131072 pages each and, as I am in single user mode, I know they are all free. onstat -d confirms that 131019 pages of 131072 are free in each of them. In case it’s relevant I also have 1,027,203 2 kb pages free in rootdbs.

The first thing that confuses me is the 8,385,216 pages the online log message says are available, which is more than I actually have. 131019 * 8 = 1048152. I think this is a bug as it’s a factor of 8 out. It’s probably assuming a 2 kb page size somewhere and my 16 kb dbspaces are a 8x multiple of this. I am using Linux so is Informix using native page size units and just not making it clear?

The index I am creating is on a bigint field and there are 7,076,224,823 rows. If I assume 110,566,014 pages actually means 210 Gb, the engine is calculating 32 bits/row or 4 bytes/row exactly which sounds right.

Anyway despite the message in the online log I am comforted by this IBM support article which tells me:

You do not have to take action. This is a warning. The database server will create the index one fragment at a time, instead of all at once.

However, it does advise me that cancelling the statement, adding more temp space and starting again would be a good idea. This is prescient as we’ll see.

Analysing this now it is probably going to fail somewhere because I need thirteen times more space but the engine can only divide the workload by working on a single fragment at a time. There are ten and they are not all exactly the same size. In fact my largest fragment has 1,950,612,068 rows, 27% of the total and based on 4 bytes/row the largest fragment I can handle would have only 536,653,818 rows. I suspect this means to succeed I will need at least 30,478,314 2 kb pages available to complete the build. I hope this all makes sense anyway!

Foolhardily and possibly because I get distracted by something I leave it to run. More messages appear in the log as the build progresses:

11:22:33 WARNING: Not enough temp space for parallel index build.
Space required = 110566014 pages; space available = 8385216 pages.
Partial index build started.
12:19:28 WARNING: Not enough temp space for parallel index build.
Space required = 110566014 pages; space available = 8385216 pages.
Partial index build started.
13:27:03 WARNING: Not enough temp space for parallel index build.
Space required = 110566014 pages; space available = 8385216 pages.
Partial index build started.
13:47:56 Session Insufficient space in temporary dbspaces:
Creating the temporary table in the root dbspace,
Temporary table size is 17632 pages.

Nearly four hours after it began at 14:27:41 my index build fails with:

212: Cannot add index.
179: ISAM error: no free disk space for sort

Harumph.

I guess there are two annoying things about this:

  1. The support article is only right if your largest fragment will not require more space than is available.
  2. The failure could have been foreseen at the beginning by looking at row counts.

Anyway, I hope this helps. If I get time I will do some more testing on this to confirm some of the assumptions I have made in writing this article. Feedback is welcome as ever (via https://informixdba.wordpress.com for those of you reading this on PlanetIDS)!


Foreign key creation performance

The Agile development model IBM has moved to for Informix development means that there’s always a few extra features in each fix pack. There are two rather nice enhancement in 11.70.FC8 when building a foreign key which alleviate some pain points when doing administration on large tables. The first is a no validate option which means that no checking of the data is done when creating the foreign key. This is appropriate in situations like migrating platforms when you know that referential integrity is already assured and you want to avoid the overhead of re-checking it all again on the new system. I want to look instead at the other enhancement: improved performance when creating a foreign key.

Nearly all of the time creating a foreign key constraint is spent checking the data; modifying the database schema to add the constraint is a very quick operation.

Prior to 11.70.FC8 creating a foreign key on a large table was a long process and with non-partitioned tables it was a single-threaded operation which did not benefit from the use of indices. Typically you’d see a thread working away like this:

tid      name     rstcb            flags    curstk   status
283      sqlexec  26e1fc3c78       --BPR--  22096    running-

Or with a fragmented table and PDQ turned on you might see multiple threads:

tid      name     rstcb            flags    curstk   status
324      sqlexec  26e1fc3c78       --BP---  19744    sleeping secs: 1     -
325      scan_1.0 26e1fc0a98       --B-R--  3248     running-
326      scan_1.1 26e1fc44c8       --B-R--  3248     running-
327      scan_1.2 26e1fc4d18       --B-R--  3248     running-
328      scan_1.3 26e1fc5568       --B-R--  3248     running-
329      scan_1.4 26e1fc5db8       --B-R--  3248     running-
330      scan_1.5 26e1fc6608       --B-R--  3248     running-
331      scan_1.6 26e1fc97e8       --B-R--  3248     running-
332      scan_1.7 26e1fcb0d8       --B-R--  3248     running-

Even with parallel operations foreign key builds could still take a long time.

With 11.70.FC8 foreign key builds are a lot faster and here is a test case which you can try. I’m going to create a 10 million row table, which is large enough to demonstrate the problem, and a referenced table with up to 32767 unique values and see how long it takes to create a foreign key.

Courtesy of the Informix FAQ I am going to create these procedures to generate random numbers.

CREATE PROCEDURE sp_setseed(n INTEGER)
  DEFINE GLOBAL seed DECIMAL(10) DEFAULT 1;

  LET seed = n;

END PROCEDURE;

CREATE PROCEDURE sp_random() RETURNING INTEGER;
  DEFINE GLOBAL seed DECIMAL(10) DEFAULT 1;
  DEFINE d DECIMAL(20,0);

  LET d = (seed * 1103515245) + 12345;
  -- MOD function does not handle 20-digit values
  LET seed = d - 4294967296 * TRUNC(d / 4294967296);

  RETURN MOD(TRUNC(seed / 65536), 32768);

END PROCEDURE;

These will generate a random number between 0 and 32767 and I’ll use these to create some data to populate my tables with.

drop procedure if exists populate;
create procedure populate ()

  define p_wid int;
  define i int;

  drop table if exists referenced;
  create table referenced (
    widget_id int not null
  );

  drop table if exists referencing;
  create table referencing (
    id_column bigserial not null,
    widget_id int not null
  );

  FOR i = 1 TO 10000000 STEP 1

    let p_wid = sp_random();
    insert into referencing (widget_id) values (p_wid);

  END FOR;

  create index ireferencing_x1 on referencing (widget_id);
  insert into referenced (widget_id) select distinct widget_id from referencing;

  create unique index ireferenced_u1 on referenced (widget_id);
  alter table referenced add constraint primary key (widget_id) constraint referenced_pk;

end procedure;

execute procedure populate();

That sets up the test. It just remains to create the foreign key, which I’ll time:

alter table referencing add constraint foreign key (widget_id) references referenced (widget_id) constraint referencing_fk;

And so for the results:

Informix version Fragmented table PDQ priority Parallel execution Foreign key build time (s)
11.70.FC7 No 0 No 40
11.70.FC7 No 100 No 40
11.70.FC7 Yes, 8-way partition by expression, evenly distributed 0 No 40
11.70.FC7 Yes, 8-way partition by expression, evenly distributed 100 Yes 40
11.70.FC7 Yes, 8-way round-robin 0 No 41
11.70.FC7 Yes, 8-way round-robin 100 Yes 67
11.70.FC8 No 0 No 0.3
11.70.FC8 No 100 No 0.3
11.70.FC8 Yes, 8-way partition by expression, evenly distributed 0 No 0.5
11.70.FC8 Yes, 8-way partition by expression, evenly distributed 100 Yes 40
11.70.FC8 Yes, 8-way round-robin 0 No 0.9
11.70.FC8 Yes, 8-way round-robin 100 Yes 68

And for comparison here is the time taken for some related SQL queries (without PDQ):

Query Duration (s)
select distinct widget_id from referencing; 9
select widget_id, count(*) from referencing group by widget_id; 7
select widget_id from referencing where widget_id not in (select widget_id from referenced); 37

The purpose of this table is to show the relative times of the foreign key build and not to benchmark any hardware or Informix itself so I’m not going to divulge what hardware I’m used for the tests. The server I used did have multiple processors and CPU VPs configured so that PDQ queries could scale out. The number of ready threads was very low or none during the operation.

I ran all the tests a few times and there was a small amount of variation so I’ve only used a low precision in the results.

So what conclusions can we draw from this?

  • The stand-out result is that the verification of foreign keys is much faster in FC8 but only for serial operations.
  • Serial foreign key builds in FC8 are significantly faster than any SQL query I could write that checked the referential integrity.
  • In FC7 and earlier parallel operations may be slower.
  • In FC8 there is not the same speed improvement for parallel operations and they are still much slower.

Another interesting finding is the large difference in performance between the expression-based fragmentation and round-robin fragmentation when using parallel operations. I had assumed that this kind of operation would be agnostic to the type of fragmentation used. I checked this a few times but examining the threads when it is running shows something different is definitely happening.

Round-robin fragmentation looks like the below. Notice that most of the threads are active:

tid      name     rstcb            flags    curstk   status
123535   sqlexec  15f2599e8        --BP---  29472    sleeping secs: 1     -
123536   scan_1.0 15f25e590        --B-R--  2400     running-
123537   scan_1.1 15f262068        Y-B----  496      cond wait  await_MC1 -
123538   scan_1.2 15f25c3f0        Y-B----  496      cond wait  await_MC1 -
123539   scan_1.3 15f261800        --B----  2400     running-
123540   scan_1.4 15f2628d0        --B-R--  2400     running-
123541   scan_1.5 15f263138        --B-R--  2400     running-
123542   scan_1.6 15f25edf8        --B-R--  2400     running-
123543   scan_1.7 15f25f660        --B----  2400     running-

Expression fragmentation seems to run with just one active thread for the majority of the time:

tid      name     rstcb            flags    curstk   status
123804   sqlexec  15f258918        --BP---  29472    sleeping secs: 1     -
123805   scan_1.0 15f25e590        --B-R--  2400     running-
123806   scan_1.1 15f25c3f0        Y-B----  496      cond wait  await_MC1 -
123807   scan_1.2 15f261800        Y-B----  496      cond wait  await_MC1 -
123808   scan_1.3 15f2628d0        Y-B----  496      cond wait  await_MC1 -
123809   scan_1.4 15f263138        Y-B----  496      cond wait  await_MC1 -
123810   scan_1.5 15f25edf8        Y-B----  496      cond wait  await_MC1 -
123811   scan_1.6 15f25f660        Y-B----  496      cond wait  await_MC1 -
123812   scan_1.7 15f2639a0        Y-B----  496      cond wait  await_MC1 -
123813   scan_1.8 15f260f98        Y-B----  496      cond wait  await_MC1 -

There is an extra scan thread (scan_1.8) because I have an (empty) remainder fragment in addition to my 8-way fragmentation.

This would go some way to explaining why round-robin is faster.

Finally, if I could wander into the realms of speculation for a moment, it does look like IBM has implemented the parallel foreign key constraint checking slightly differently for each fragmentation method, which might explain why the improved performance is only seen with serial operations.

To sum up, this is a wonderful enhancement but I’ll be careful to switch off parallel processing when building foreign keys. It greatly speeds up index builds on fragmented tables, which is the type of operation you might do at the same time so I can foresee scripts with parallel processing being enabled and disabled several times to ensure best speed.


Index fragment page limit

Once in a while something comes along to make a DBA’s life easier.

Spot the difference between these similar pages in the 11.70 Info Center and the new 12.10 IBM Knowledge Center.

The eagle-eyed amongst you will immediately spot the new line:

Maximum number of pages per index fragment: 2,147,483,647

This is a 128 times improvement on the previous limit of 16,775,134, the same as the data pages per fragment limit, which limited an index fragment to just shy of 256 Gb in a 16 kb dbspace. The new limit of 32 Tb (again with a 16 kb dbspace) is much easier to work with. It applies only to detached indices.

Maybe it’s not the sexiest improvement but it actually arrived at the same time as storage pools in version 11.70 and so has been with us for a while. In my own test I was able to build an unfragmented detached index of over 32 Gb in a 2 kb dbspace in 11.70.FC7.

Should you rely (in version 11.70) on undocumented functionality? Without your own testing, maybe not. However, it’s good that this extra breathing space for DBAs exists and – with 12.10 – is documented and fully supported.


Converting fragment by expression to fragment by range

I really like the new fragment by range method of partitioning new in 11.70 and have blogged about it before, but are IBM missing a trick somewhere?

A common scenario is a table fragmented by expression on a date or datetime column, say with one partition per month, that’s come across from your 11.50 or earlier database instance. This sort of thing sounds ideal for the new fragment by range feature as it automates the adding of a new partition each month meaning that you don’t have to do it yourself or via an application, or allocate loads of partition up front wasting valuable space.

So can we easily convert a fragment by expression table to a fragment by range table? By this I mean without row movement. Well, it seems not although I welcome anyone showing me how it should be done. Take this relatively simple index-less table:

create table conversion_test (
test_id serial not null,
ttimestamp datetime year to second not null
) fragment by expression
partition prehistory (ttimestamp < '2012-10-01 00:00:00') in dbspace1,
partition oct_2012 (ttimestamp < '2012-11-01 00:00:00') in dbspace2,
partition nov_2012 (ttimestamp < '2012-12-01 00:00:00') in dbspace3
extent size 1024 next size 256 lock mode row;

Before we go any further it’s worth noting that I’ve chosen my fragmentation strategy fairly carefully to be compatible with the fragment by range syntax which doesn’t seem to accept operators other than ‘<' for the initial fragments, something the manual doesn't state.

Insert some data:

insert into conversion_test (ttimestamp) values ('2012-09-01 00:00:01');
insert into conversion_test (ttimestamp) values ('2012-09-02 00:00:01');
insert into conversion_test (ttimestamp) values ('2012-10-01 00:00:01');
insert into conversion_test (ttimestamp) values ('2012-10-02 00:00:01');
insert into conversion_test (ttimestamp) values ('2012-11-01 00:00:01');
insert into conversion_test (ttimestamp) values ('2012-11-02 00:00:01');

Let’s try and convert this to fragment by range:

alter fragment on table conversion_test init
fragment by range (ttimestamp) interval (1 units month) store in (dbspace4)
partition prehistory (ttimestamp < '2012-10-01 00:00:00') in dbspace1,
partition oct_2012 (ttimestamp < '2012-11-01 00:00:00') in dbspace2,
partition nov_2012 (ttimestamp < '2012-12-01 00:00:00') in dbspace3;

When running this I am hoping that:

  • The data do not move and the partitions remain as they are.
  • The only thing that changes is that if I now insert data for December 2012, a new partition is automatically created.

Does it work? Well, it runs. Unfortunately though, row movement occurred which we can see from the logical log:

fffff018 56 BEGIN 26 10 0 11/05/2012 10:46:15 92 informix
fffff050 388 BLDCL 26 0 4018 100274 512 128 12 0 conversion_test
fffff1d4 52 CHALLOC 26 0 4050 00001:0000496520 200
fffff208 56 PTEXTEND 26 0 41d4 100274 511 00001:0000496520
fffff240 388 BLDCL 26 0 4208 100275 512 128 12 0 conversion_test
fffff3c4 52 CHALLOC 26 0 4240 00001:0000497032 200
fffff3f8 56 PTEXTEND 26 0 43c4 100275 511 00001:0000497032
fffff430 388 BLDCL 26 0 43f8 100276 512 128 12 0 conversion_test
fffff5b4 52 CHALLOC 26 0 4430 00001:0000497544 200
fffff5e8 56 PTEXTEND 26 0 45b4 100276 511 00001:0000497544
...
fffff6b8 76 HINSERT 26 0 4688 100274 101 12
fffff704 76 HINSERT 26 0 46b8 100274 102 12
fffff750 76 HINSERT 26 0 4704 100275 101 12
fffff79c 76 HINSERT 26 0 4750 100275 102 12
fffff7e8 76 HINSERT 26 0 479c 100276 101 12
fffff050 76 HINSERT 26 0 47e8 100276 102 12

I don’t like that as on a big table this would be a long logged operation so I am going to cheat and show that this simple table can be switched to fragment by range by hacking around in systables (don’t do this at home and definitely nowhere near a production system).

By cloning my database and performing the change to fragment by range I can see what the entries in the sysfragments table look before and after and unload them to files.

unload to conversion_test_sysfragments.unl
select * from sysfragments where tabid=(select tabid from systables where tabname='conversion_test');

I get three rows from before and six rows from after.

I now need to fiddle the after unload file a bit from after the alter and put the values for partn and exprbin from before the alter into the three original rows as Informix changed these when it re-initialised the table and in my cheat method the part numbers will not change. I then revert to my original database which still has the table fragmented by expression and load in the new sysfragments entries:

delete from sysfragments where tabid=109;

3 row(s) deleted.

load from conversion_test_sysfragments.unl insert into sysfragments;

6 row(s) loaded.

Dbschema shows the result immediately and it seems I don’t even have to bounce the engine:

{ TABLE "informix".conversion_test row size = 12 number of columns = 2 index size = 0 }

create table "informix".conversion_test
(
test_id serial not null ,
ttimestamp datetime year to second not null
)
fragment by range(ttimestamp) interval(interval( 1) month(9) to month) store in(dbspace4)
partition prehistory VALUES < datetime(2012-10-01 00:00:00) year to second in dbspace1,
partition oct_2012 VALUES < datetime(2012-11-01 00:00:00) year to second in dbspace2,
partition nov_2012 VALUES < datetime(2012-12-01 00:00:00) year to second in dbspace3
extent size 1024 next size 256 lock mode row;

And everything still appears to work in that I can select data and insert new rows, including rows for December 2012 and beyond where a new partition is automatically created.

So what does this post prove? It shows in principle that it’s possible to covert a table fragmented by expression with no indices to one with an identical initial fragmentation strategy fragmented by range, at least where the original strategy is compatible with the range partitioning. I don’t see in principle why this would not also work with indices in place, although I have not tested it. It seems like a nice feature to put into Informix at relatively low effort. One for V.next?


Partitioning by range

Partitioning or fragmenting large tables is a great Informix feature. The ability to fragment by an arbitrary SQL expression or round robin provides more flexibility than is found in many competing RDBMSs.

One thing it never catered for that well though was large log tables that include a timestamp of some description. Most database schemas I have seen have at least one of these: it is usually some kind of application or audit log; it’s generally one of the largest tables in your schema; and after a while there will probably be a requirement to purge the oldest entries from it.

Prior to version 11.70, for these tables – if you wanted to employ fragmentation – you had several options:

Fragment by round-robin
This was the easiest option and meant you could spread this large table over several dbspaces (and therefore disks or controllers) and also meant you could make good use of parallel data queries when you needed to.
Fragment by expression using specific date ranges
This was a good option in that you could take the benefits of round-robin fragmentation and add in the possibility of fragment elimination on suitable queries. (This is where the Informix optimiser knows that a certain table fragments won’t contain any data your query is looking for because the table storage schema says so.) Additionally if any indices on this table, including those used to enforce constraints, followed the same storage schema, purging could be achieved by dropping the oldest fragment instead of thrashing away deleting data and taxing your logical logs and the BTree scanner. The major drawback of this method is that if you weren’t careful you could run out of fragments since you needed to allocate the fragments you needed and the disk space for them ahead of time.
Fragment by expression with a function
One way of avoiding the drawback of needing to allocate fragments was to use some sort of storage schema like:
partition by expression
partition january (month(timestamp) = 1) in dbspace1,
partition february (month(timestamp) = 2) in dbspace2,
...

This wasn’t particularly satisfactory either as you missed out on fragment elimination because the optimiser wasn’t clever enough to realise that your data for February 2011 existed in the February partition unless your SQL statement included the condition month(timestamp) = 1; something like timestamp = ‘2011-02-28 09:46:00’ wouldn’t do. It was also useless from a purging perspective unless you wanted to purge everything for a particular month, regardless of the year.

Version 11.70 gives us the best of all these worlds through the new partition by range option. We can create an example log table using this SQL:

drop table if exists log_table;

create table log_table (
id serial not null,
timestamp datetime year to second not null,
loginfo varchar(254)
)
partition by range (timestamp) interval (1 units month) store in (maindbs)
partition initial_log_table values < (cast('2011-01-01 00:00:00' as datetime year to second)) in maindbs
extent size 8192 next size 2048 lock mode row;

alter table log_table add constraint primary key (id) constraint pk_log_table;
create index ix_timestamp on log_table (timestamp);

I’ve just opted to put all my fragments in maindbs for simplicity but a comma-separated list of dbspaces can be supplied in the store in clause, which will be then be used in a round-robin fashion. Because I’ve not given the index ix_timestamp its own storage clause, it will be stored in the same way as the table, which we’ll see below. I’ve also used the new if {not} exists clause available in 11.70 which is a very welcome addition.

We need to be careful to avoid the error 25803: Interval or list fragmentation must have at least one non-null, non-remainder fragment which can occur if we don’t include the initial fragment I’ve included above (or something like it).

Let’s insert some data using this procedure (much nicer than hundreds of insert statements):

drop procedure if exists insert_data;
create procedure insert_data ()

define p_timestamp datetime year to second;
define p_logno smallint;

let p_timestamp = cast('2011-01-01 00:00:00' as datetime year to second);
let p_logno = 1;

while p_timestamp < current

insert into log_table (timestamp, loginfo) values (p_timestamp, 'Log entry no. ' || p_logno);

let p_timestamp = p_timestamp + 977 units minute;
let p_logno = p_logno + 1;

end while;

end procedure;

execute procedure insert_data();

By running oncheck -pe we can see that this has resulted in the creation of several partitions:

testdb:'informix'.log_table 1535 4096
testdb:'informix'. 127_75 5631 199
testdb:'informix'.ix_timestamp 5830 208
testdb:'informix'.log_table 6038 4096
testdb:'informix'.ix_timestamp 10134 208
testdb:'informix'.log_table 10342 4096
testdb:'informix'.ix_timestamp 14438 208
testdb:'informix'.log_table 14646 4096
testdb:'informix'.ix_timestamp 18742 208
testdb:'informix'.log_table 18950 4096
testdb:'informix'.ix_timestamp 23046 208
testdb:'informix'.log_table 23254 4096
testdb:'informix'.ix_timestamp 27350 208
testdb:'informix'.log_table 27558 4096
testdb:'informix'.ix_timestamp 31654 208
testdb:'informix'.sysfragments 31862 8
testdb:'informix'.log_table 31870 4096
testdb:'informix'.ix_timestamp 35966 208
testdb:'informix'.log_table 36174 4096
testdb:'informix'.ix_timestamp 40270 208
testdb:'informix'.log_table 40478 4096
testdb:'informix'.ix_timestamp 44574 208
testdb:'informix'.log_table 44782 4096
testdb:'informix'.ix_timestamp 48878 208
testdb:'informix'.log_table 49086 4096
testdb:'informix'.ix_timestamp 53182 208

The above shows that we have a partition for each month and also an index partition for each month.

We can get more information by querying the sysfragments table:

unload to 'fragments.txt' delimiter '|'
select fragtype, index name, partition, exprtext
from sys fragments
where tabid=127;

The tab id for our table is 127 – we can get this by querying systables. This gives us:

T|||timestamp|
T|||interval( 1) month(9) to month|
T|||maindbs|
T||initial_log_table|VALUES < datetime(2011-01-01 00:00:00) year to second|
I| 127_75|maindbs||
I|ix_timestamp||timestamp|
I|ix_timestamp||interval( 1) month(9) to month|
I|ix_timestamp||maindbs|
I|ix_timestamp|initial_log_table|VALUES < datetime(2011-01-01 00:00:00) year to second|
T||sys_p1|VALUES >= datetime(2011-01-01 00:00:00) year to second AND VALUES < datetime(2011-02-01 00:00:00) year to second|
I|ix_timestamp|sys_p1|VALUES >= datetime(2011-01-01 00:00:00) year to second AND VALUES < datetime(2011-02-01 00:00:00) year to second|
T||sys_p2|VALUES >= datetime(2011-02-01 00:00:00) year to second AND VALUES < datetime(2011-03-01 00:00:00) year to second|
I|ix_timestamp|sys_p2|VALUES >= datetime(2011-02-01 00:00:00) year to second AND VALUES < datetime(2011-03-01 00:00:00) year to second|
T||sys_p3|VALUES >= datetime(2011-03-01 00:00:00) year to second AND VALUES < datetime(2011-04-01 00:00:00) year to second|
I|ix_timestamp|sys_p3|VALUES >= datetime(2011-03-01 00:00:00) year to second AND VALUES < datetime(2011-04-01 00:00:00) year to second|
T||sys_p4|VALUES >= datetime(2011-04-01 00:00:00) year to second AND VALUES < datetime(2011-05-01 00:00:00) year to second|
I|ix_timestamp|sys_p4|VALUES >= datetime(2011-04-01 00:00:00) year to second AND VALUES < datetime(2011-05-01 00:00:00) year to second|
T||sys_p5|VALUES >= datetime(2011-05-01 00:00:00) year to second AND VALUES < datetime(2011-06-01 00:00:00) year to second|
I|ix_timestamp|sys_p5|VALUES >= datetime(2011-05-01 00:00:00) year to second AND VALUES < datetime(2011-06-01 00:00:00) year to second|
T||sys_p6|VALUES >= datetime(2011-06-01 00:00:00) year to second AND VALUES < datetime(2011-07-01 00:00:00) year to second|
I|ix_timestamp|sys_p6|VALUES >= datetime(2011-06-01 00:00:00) year to second AND VALUES < datetime(2011-07-01 00:00:00) year to second|
T||sys_p7|VALUES >= datetime(2011-07-01 00:00:00) year to second AND VALUES < datetime(2011-08-01 00:00:00) year to second|
I|ix_timestamp|sys_p7|VALUES >= datetime(2011-07-01 00:00:00) year to second AND VALUES < datetime(2011-08-01 00:00:00) year to second|
T||sys_p8|VALUES >= datetime(2011-08-01 00:00:00) year to second AND VALUES < datetime(2011-09-01 00:00:00) year to second|
I|ix_timestamp|sys_p8|VALUES >= datetime(2011-08-01 00:00:00) year to second AND VALUES < datetime(2011-09-01 00:00:00) year to second|
T||sys_p9|VALUES >= datetime(2011-09-01 00:00:00) year to second AND VALUES < datetime(2011-10-01 00:00:00) year to second|
I|ix_timestamp|sys_p9|VALUES >= datetime(2011-09-01 00:00:00) year to second AND VALUES < datetime(2011-10-01 00:00:00) year to second|
T||sys_p10|VALUES >= datetime(2011-10-01 00:00:00) year to second AND VALUES < datetime(2011-11-01 00:00:00) year to second|
I|ix_timestamp|sys_p10|VALUES >= datetime(2011-10-01 00:00:00) year to second AND VALUES < datetime(2011-11-01 00:00:00) year to second|
T||sys_p11|VALUES >= datetime(2011-11-01 00:00:00) year to second AND VALUES < datetime(2011-12-01 00:00:00) year to second|
I|ix_timestamp|sys_p11|VALUES >= datetime(2011-11-01 00:00:00) year to second AND VALUES < datetime(2011-12-01 00:00:00) year to second|

We can see from the bottom entries that Informix has created some partition names for us, since it is managing the partitions themselves for us. It also shows us an equivalent “fragment by expression” syntax which it’s automatically generated based on the interval we specified.

So can we drop partitions we don’t need without having to delete all the data? Yes, we can: detach as new table old_logs and then drop our unwanted partition for January 2011 as follows:

alter fragment online on table log_table detach partition sys_p1 old_logs;
drop table old_logs;

Note that the above is not an online operation and will require a table lock to work.

Does fragment elimination work? Yes, as we’re now just using an automatically managed version of fragment by expression, where fragment elimination has always worked as this shows:

set explain on;
select * from log_table
where timestamp between '2011-08-01 00:00:00' and '2011-08-02 00:00:00';

Looking at the query plan we have:

QUERY: (OPTIMIZATION TIMESTAMP: 11-23-2011 21:01:58)
------
select * from log_table where timestamp between '2011-08-01 00:00:00' and '2011-08-02 00:00:00'

Estimated Cost: 2
Estimated # of Rows Returned: 49

1) informix.log_table: INDEX PATH

(1) Index Name: informix.ix_timestamp
Index Keys: timestamp (Serial, fragments: 8 )
Fragments Scanned: (8) sys_p8 in maindbs
Lower Index Filter: informix.log_table.timestamp >= datetime(2011-08-01 00:00:00) year to second
Upper Index Filter: informix.log_table.timestamp <= datetime(2011-08-02 00:00:00) year to second

The above shows that the engine only looked in fragment no. 8.

So all in all, an excellent feature that allows us to:

  • Fragment our data by date range or another interval without having to worry about running out of partitions for future data.
  • Control where these fragments go using a single dbspace or a round-robin approach.
  • Achieve fragment elimination with suitable queries (reduce contention, improve performance).
  • Solves the problem of purging data using delete from statements which can thrash the logical logs and BTree scanner.
  • Provides for parallel data query.