Large table operationsPosted: 6 August, 2021
Many Informix systems are at the heart of 24/7 production services and getting downtime for maintenance can be costly. Your business may have adopted Site Reliability Engineering and its concept of error budgets or just needs to minimise the times when systems are unavailable for customers.
The same systems can also store increasing volumes of data and this means thought has to be given to how large operations are approached. Every junior DBA has probably felt the need to hit Ctrl-C during an operation taking too long and the duration of any roll back needs to be weighed against letting an operation continue before doing so. Being able to replicate your production system accurately can allow you to assess the merits of each and how much time you need.
Here are some examples of large table operations a DBA might need to undertake:
- build a new index, partition an existing index or convert to a forest-of-trees index
- table partitioning or change of page size
- large scale data deletion
This blog post will cover some of the methods and gotchas there are.
|Method||Transaction handling||New index||Table partitioning||Change of table page size||Managing data|
|ALTER FRAGMENT… ADD PARTITION…||Single transaction||No||Add partitions to an already partitioned table||No||No|
|CREATE INDEX…||Single transaction||Simple create index statement||No||No||No|
|ALTER FRAGMENT… INIT…||Single transaction||No||Rebuild table in one operation||Rebuild table in one operation||No|
|ALTER FRAGMENT… ATTACH or DETACH||Single transaction||No||Partition a non-partitioned table||No||Attach or detach whole table partitions.|
|Loopback replication||Row||Copy of the whole table with new index||Copy of the whole table with the new storage schema||Copy of the whole table with the new storage schema||Copy of the table with a subset of the data|
|High Performance Loader (HPL)||Configurable commit interval||Copy of the whole table with new index||Copy of the whole table with the new storage schema||Copy of the whole table with the new storage schema||Copy of the table with a subset of the data|
Before you go ahead in production
All the SQL data definition methods complete in a single transaction and this means you need to consider logical logging requirements.
Do you have enough logical log space to support the operation? ‘onconfig’ parameter LTXHWM determines the percentage of logs used before any long transactions are automatically rolled back. Rolling back can use more logs and reaching LTXEHWM means the long transaction rolling back gets exclusive use of the system. It is best to make sure an unexpected roll back does not occur. Rolling back got faster in 12.10 with the use of read-ahead but important to note that rolling back is generally a single-threaded operation.
If you are backing up logs properly as you should be you will also want to make sure you do not switch logs appreciably faster than you can back them up. Some backup solutions work more efficiently with larger logs.
The ideal test environment is a clone of your production system. A clone or redirected restore can then be used to:
- repeatedly test and refine your implementation.
- get reasonably accurate timings.
- understand logical log requirements.
- find out whether you can do all or some of your work with the system online (this can be tricky).
If you aren’t privileged enough to have such an environment, you need to ensure your test system is as close to production as it can be. For this type of test you need:
- A copy of the table being worked on with the same storage layout (dbspaces, page sizes, fragmentation strategy).
- A copy of any tables referencing or referenced via a foreign key.
- Some data.
Often only a small amount of data is needed but it is fairly straightforward to populate test tables with tens of millions of rows, which may be enough to make any operations which could be time consuming noticeable.
Having scripts to create the test conditions from scratch which you can re-run or get the system to a state where you can backup and repeatedly restore is essential.
Easy win: ALTER FRAGMENT… ADD PARTITION…
Informix 12.10.xC3 onwards: if your table uses round-robin partitioning and partitions are filling up you can simply add more partitions to it and enable AUTOLOCATE in the onconfig file:
ALTER FRAGMENT ON TABLE my_table ADD PARTITION my_partition_09 IN my_dbspace1; ALTER FRAGMENT ON TABLE my_table ADD PARTITION my_partition_10 IN my_dbspace2;
This is quick, easy and avoids the complication of the other methods described here. I’ll cover options for if your table is not already partitioned shortly.
Moving onto something a bit different but it will make some sense later.
Creating an index is easy but it’s important to bear in mind a few well-known requirements or optimisations:
- having adequate sort space (DBSPACETEMP or file system with PSORT_DBTEMP) to avoid partial builds.
- improving parallelism by setting PDQPRIORITY (if using Enterprise Edition).
If index page logging is enabled on your system this can substantially increase the index build time as the index build has three main stages before locks are released.
- build the index, working with temp space if too large to do in memory.
- write the new index to its dbspace(s).
- scan the new index (ipl thread), generating logical log records to create a copy of it on your secondaries.
You can attempt to build an index online but this may not work on busy active tables.
The important thing to bear in mind is that the operations we go onto cover may perform index rebuilds and these same optimisations can be applied.
ALTER FRAGMENT… INIT…
This is also a very easy and tempting option for rebuilding a table: a single command which manages it all for you and leaves all references, constraints and other aspects of the logical schema intact. This avoids a lot of the preparation and testing involved with other methods.
We can use this to rebuild or repartition both tables and indices. So why don’t we use this method all the time?
- statement runs as a single transaction.
- table is locked during the process.
- when altering a table all the indices on the table need to be updated with new pointers.
For large tables, especially those with a lot of indices, there will be a lot of updates written to the logical logs. There is a danger the transaction will roll back, either due to reaching LTXHWM or a DBA aborting.
We also have to do one operation at a time so repartitioning a table or an index would involve rebuilding the indices as part of the table then having to do the indices a second time.
In my opinion this operation is only a good choice for small tables or where time is not critical.
ALTER FRAGMENT… ATTACH or DETACH
This operation allows you to merge tables or split partitioned tables. As with the “init” operation, this keeps the logical schema intact but can be a slow process if table row movement occurs or indices need to be rebuilt.
With particular partitioning strategies it is possible to attach or detach partitions with no table row movement and in a subset of these also ensure no index changes occur either, which makes the whole operation very fast. Unless you planned to use these feature and designed your schema accordingly it’s unlikely you will stumble across these.
If you are thinking of using this method to partition an unpartitioned table it is possible to avoid table or index movement if moving to an expression-based table or, if using 12.10.xC15, a range partitioned table. The starting conditions are quite strict though: on the existing table any indices must reside in the same dbspace as the table. A second table with an identical logical schema must be created where indices “follow the table”, that is do not have a storage clause of their own.
Here is an example of a table which could be partitioned using this method:
drop table if exists "benthompson".toriginal; create schema authorization "benthompson" create table "benthompson".toriginal ( my_id int not null , secondary_id integer default null, created datetime year to second default current year to second not null , vchar1 varchar(15) not null , tertiary_id integer not null , expiry_date datetime year to second default null ) in my_dbspace1 lock mode row; create unique index "benthompson".ioriginal_x1 on "benthompson".toriginal (my_id, vchar1, tertiary_id) using btree in my_dbspace1; create index "benthompson".ioriginal_x2 on "benthompson".toriginal (secondary_id) using btree in my_dbspace1; alter table "benthompson".toriginal add constraint primary key (my_id, vchar1, tertiary_id) constraint "benthompson".coriginal_pk;
The indices need to all be capable of following the table, meaning we could not have another unique index unless the leading column was my_id.
We can create an empty table with the same schema as follows:
drop table if exists "benthompson".texpression; create table "benthompson".texpression as select * from toriginal where 1=0;
Then partition it by expression:
alter fragment on table texpression init fragment by expression (my_id < 0) in my_dbspace1, (my_id >= 910000000) in my_dbspace2;
Or with range partitioning:
alter fragment on table texpression init fragment by range (my_id) interval (10000000) partition tfm_range_p0 values < 0 in my_dbspace2;
The new table will not have any indices so index it:
create unique index "benthompson".iexpression_x1 on "benthompson".texpression (my_id, vchar1, tertiary_id) using btree; create index "benthompson".iexpression_x2 on "benthompson".texpression (secondary_id) using btree;
As the new table is empty all these operations will be quick.
Drop constraints on the original table (required for the attach):
alter table toriginal drop constraint coriginal_pk;
Attach the original table to the new:
alter fragment on table texpression attach toriginal as (my_id>=0 and my_id <10000000) after my_dbspace1;
or for range partitioning:
alter fragment on table texpression attach toriginal as partition texpression_p1 values < 10000000;
Finally add a primary key to the new table:
alter table "benthompson".texpression add constraint primary key (my_id, vchar1, tertiary_id) constraint "benthompson".cexpression_pk;
If doing attach operations in quick succession on the same table you may wish to disable AUTO_STAT_MODE otherwise you may need to wait for an UPDATE STATISTICS operation to complete in between.
Detaching a partition is straightforward syntactically but again may be slow unless the index is partitioned the same way as the table.
For a quick detach without any index rebuilds, the database engine requires the index to follow the table (sysfragments ‘strategy’ column set to ‘T’) but there is a recent fix where this will work for expression based indices if the expression is identical to one belonging to the table.
IT34340: ALTER FRAGMENT DETACH ON TABLE FRAGMENTED BY INTERVAL REBUILDS INDEX EVEN WHEN INDEX HAS SAME INTERVAL EXPRESSION
Once a fragment has been detached the engine will rebuild table statistics if AUTO_STAT_MODE is enabled and this will prevent a second fragment being detached.
If testing with a small table it can be difficult to tell how efficient the operation was. To verify whether your operation results in row movement or implicit index rebuilds, you can use ‘oncheck -pe’ to compare the extents for the table and any indices before and after your operation. All extents should remain the same if no movement occurred even though the table names may change.
Overview and setup
Loopback replication was introduced in 12.10.xC11 and is perhaps most easily thought of as Enterprise Replication (ER) between two tables residing in the same instance. We can use it to rebuild a table by creating a new copy of it with whatever storage schema we desire. The new table could differ from the original in a number of ways: it could be indexed differently or have fewer or more columns, for example.
Using loopback replication requires some preparation before anything can be replicated. I won’t cover it all in detail here because it is in the Informix manual. In short you need to:
- Create dbspaces and an sbspace for CDR_DBSPACE, CDR_QHDR_DBSPACE and CDR_QDATA_SBSPACE.
- Tune the onconfig, especially CDR_EVALTHREADS, CDR_QUEUEMEM and CDR_MEM.
- Make changes to your sqlhosts and onconfig variable DBSERVERALIASES
- Define servers using ‘cdr define server’.
cdr define server -A /var/informix/ats/g_my_instance_er_server -R /var/informix/ris/g_my_instance_er_server -I g_my_instance_er_server cdr define server -A /var/informix/ats/g_my_instance_loopback -R /var/informix/ris/g_my_instance_loopback -I g_my_instance_loopback -S g_my_instance_er_servers
I will refer to the original table as the “source” and the new table as the “target”. In my examples the source and target will share the same logical schema.
The advantage of loopback replication is that the copying of data which takes the most time can take place while your system is running, in much the same way as ER replicates tables on a running system. The steps which do require exclusive access are fast.
So how can we use it to, say, make a copy of a one billion row table with the end result being two large tables with the exact same data being replicated asynchronously? At the end of the process we want to switch off replication and use renaming to move the new table into position but how do we get to this point?
All approaches start with creating a new empty table just as we want it. As we don’t want to have to do large index builds at the end of the exercise we should create all indices and constraints on the target at the beginning. We might choose to add foreign keys, both referencing other tables and others referencing this table, and triggers at the end.
If you’re unfamiliar with ER, tables being replicated are called “replicates”: more strictly speaking a replicate isn’t necessarily the whole table but could be just a subset of its columns and a subset of its rows, filtered by a where clause. When you start replicating from one replicate to another, what goes across? Only new transactions are replicated: for an insert this is straightforward (provided it does not violate a constraint on the target). Updates and deletes may find there is no row on the target to update or delete. You can control what happens in these scenarios with the CDR_SUPPRESS_ATSRISWARN onconfig parameter. Amongst other things this can enable is “upgrading” an update to an insert. One important thing to note is that unless your table represents some kind of rolling log, it is not sufficient to just start loopback replication: you will need to do something to load existing data.
Creating a replicate:
cdr define repl --connect=g_my_instance_er_server loopback_replicate --conflict=always --scope=row --ats --ris --floatieee --master=g_my_instance_er_server "P my_dbname@g_my_instance_er_server:benthompson.original_table" "select * from original_table" "R my_dbname@g_my_instance_loopback:benthompson.new_table" "select * from new_table"
Don’t forget to start the replicate.
Pre-loading existing data
Let’s consider some ways we can load data that pre-exists when we began replicating. This is the hardest part.
- Perform a dummy update on rows at the source to cause them to be replicated to the target.
- Use ER in-built tools to check and repair or sync the replicates.
- Use a combination of (2) and High Performance Loader (HPL) or other load method to do the heavy lifting.
(1) is a home brew approach but gives you control. A dummy update is simply an update which doesn’t actually change anything, i.e. setting a value to what it’s already set to, but it can be used to flag old data for replication. Probably you don’t want to use this method but it is there.
cdr check (with repair option)
‘cdr check’ goes through every row on the target and checks it exists on the source and all values within the replicate are the same. It will also find any extra rows in the target replicate not in the source. By using its repair option it will not just produce a report, it will also correct the data by pushing updates onto the send queue. It works best when the differences between the source and target are not large so preloading as much data as possible is a good idea. HPL is an efficient and straightforward way of doing this (worked example below).
‘cdr check’ isn’t quick and it is single threaded. On very busy tables it may not be able to keep up with the rate of change. A checksum operation is done on all columns in the replicate not in the primary key (or unique index) on both the source and target and compared. Rates of 2000 rows/second could be all you see. It is possible to vastly improve its performance by adding a pre-calculated “replcheck” column and indexing this column in a new unique index also containing the primary key. You will probably not be able to take advantage of this because adding this column on the source presents the same challenges you are trying to work around in the first place and additionally adding the replcheck column locks the table during the entire operation.
‘cdr check’ example:
nohup cdr check replicate --master=g_my_instance_er_server --repl=loopback_replicate g_my_instance_loopback --repair >loopback_replicate.out --name loopback_replicate_check_repair 2>&1 &
Check status with (this only works if your check replicate job has been given a name):
cdr stats check
An alternative is to use ‘cdr sync’ to bring the target inline with the source. Having an empty target table actually permits certain optimisations: therefore if the target only contains a small subset of the data it is best to truncate it and start from empty. The database will build a shadow replicate which will be deleted at the end of the sync. I don’t recommend ‘cdr sync’ unless you have 14.10.xC6 or later: in earlier versions a few undesirable things can happen at particular points in the process, such as a bloated send queue and a large number of row locks on participating tables when deleting the shadow replicate. ‘cdr sync’ deserves a whole article of its own but if you do wish to use it make sure CDR_EVALTHREADS onconfig parameter is < 10 and ignore the manual’s recommendation to have at least one per CPU VP.
Onconfig tuning for loopback replication
|CDR_EVALTHREADS||One per CPU VP but never more than 10, i.e.|
|Do not follow manual’s recommendation of one per CPU VP on large systems. Extra threads will just wait on a mutex.|
|CDR_QUEUEMEM||65536 – 131072 if earlier than 14.10.xC6||64 – 128 MB is the recommendation from HCL support and larger values may be detrimental.|
In 14.10.xC6+ larger values may be beneficial.
|CDR_SUPPRESS_ATSRIS_WARN||3||Depends on your use case, ‘3’ will convert updates to inserts if rows not on the target.|
|CDR_QDATA_SBSPACE||One or more smart blob space||Large loopback jobs can result in a large send queue in versions earlier than 14.10.xC6, could easily need 10s of GBs. Smart blob spaces used by ER do not auto-expand.|
A few tidy-up jobs:
- Stop and delete replicate.
- Switch tables, may include:
- Drop foreign keys on original table.
- Rename constraints and indices.
- Rename original table and rename new table in its place.
- Drop and any recreate triggers.
- Recreate any foreign keys referencing the original table, find with:
select t.tabname, c.constrname from sysconstraints c, systables t where c.tabid=t.tabid and c.constrid in (select constrid from sysreferences where ptabid in (select tabid from systables where tabname='XXXXXXXX'));
Unload and load jobs are needed.
Create a named pipe (this does not work on Windows; if using Windows you would need to use an intermediate file):
mknod /home/informix/mytable.pipe p
Create the unload and load jobs:
onpladm create job mytable_unload -d 'cat > /home/informix/mytable.pipe' -fup -D dbname -t original_table -T my_informixserver onpladm create job mytable_load -d 'cat /home/informix/mytable.pipe' -flpc -D dbname -t new_table -T my_informixserver
Creating these jobs will create an onpload database in your instance if it does not already exist.
Run jobs in two different windows:
onpladm run job mytable_unload -fu -l /home/informix/mytable_unload.out -S my_informixserver onpload -j mytable_load -fl -I 200000 -i 200000 -l /home/informix/mytable_load.out
You may want to run these in the background using nohup, e.g.
nohup onpladm run job mytable_unload -fu -l /home/informix/mytable_unload.out -S my_informixserver >unload.out 2>&1 & nohup onpload -j mytable_load -fl -I 200000 -i 200000 -l /home/informix/mytable_load.out >load.out 2>&1 &
The log files are quite verbose and should flag any problems.
I believe the above is much easier than writing a stored procedure. HPL will handle transaction size (200000 rows per commit) in the example above. Anyone who remembers the dated and unfriendly HPL GUI will notice I did not need to use it.
If HPL goes wrong you can start again by:
- killing any onpload or onpladm processes.
- dropping the onpload database (this removes all job definitions).
- truncating the target table.
There is a bug (IT36978) where HPL will corrupt compound forest-of-tree indices which may be fixed in 12.10.xC15 or 14.10.xC7. Avoid defining such an index on target tables you wish to use with HPL.
This presentation has been presented to IIUG TV.