Compliance is one of those things you can hardly ignore as a DBA these days. Whether it’s a PCI-DSS, financial or internal best practice audit, at some point someone is going to ask you whether you are using database auditing. In my experience the auditors struggle to ask Informix specific questions but this is one that always comes up.
I guess there are three answers to this question:
- Yes, we use database auditing.
- No we don’t use Informix auditing but we have a third party solution somewhere else in the stack that means someone else worries about it.
- Can we have a compensating control, please?
Rarely I find that auditors are too concerned about the detail of what you’re actually auditing. If you can log in, do some stuff and show them that this resulted in some messages in the audit log, they are usually happy. They are usually more concerned about where the logs go, who can read them and so on.
While the auditors are clearly not Informix DBAs familiar with all the auditing pneumonics, they are not daft and know they can take most of what they asked for granted next year and ask more probing questions next time.
So should you look at onaudit for your requirements? It’s been around a long time but I expect it may see a pick up in interest as more and more systems take payments in one way or another. In some ways it could do with some updates. Integration with syslog, allowing easy upload to a centralised question, is needed. There is an RFE open for this (id 58678). It’s not mine but it had six votes when I last checked and it deserves more!
Positives about onaudit include:
- It’s free with all editions.
- Provided you stay away from selective row auditing (I don’t cover this in this blog post) and don’t try to audit much or any of what your application does the overhead is negligible.
- It gives you as a DBA a clearer idea of what is happening on your system.
So I think it’s certainly worthy of consideration. I know some customers prefer security solutions external to the database like Guardium but these are costly. I don’t know much about them so I shall leave that thought there.
Auditing needs to be part of a more general secure framework. If everyone at your site logs in as user informix or any shared account, the worst case being the same account as your application, it’s not going to be as useful. Applying rules by user will be difficult or impossible.
Some sites I’ve seen let DBAs do all their work as user informix. It definitely saves developing a more secure framework for DBAs to work in (this is not a good thing!) but has disadvantages. Even if you avoid shared passwords by using sudo to informix (on UNIX) having logged in as yourself, you’d need then to cross-check with the secure logs on your server to see who it was and if two people have escalated privileges at the same time it can be tricky to distinguish their actions. Ideally you need DBAs and every other real person working under their own user ids as much as possible.
To work as a DBA without access to the informix account you simply add yourself to the same group as the group owning the $INFORMIXDIR/etc folder and grant yourself dba in any databases you need to do DDL in, plus sysmaster, sysadmin, sysutils, sysha and sysuser but it still presents the following challenges which may require specific sudo type solutions:
- Starting an instance; stopping one is no problem.
- Running xtrace and certain oncheck commands.
Additionally as a DBA you may need root access occasionally for installations, upgrades and to use debuggers.
So before you even start there are some highly desirable prerequisites:
- Your applications use their own account or (ideally) accounts and real users cannot run ad-hoc sessions using these.
- Real users don’t use shared accounts (and especially not shared passwords). This means locking down the informix account.
- DBAs practise what they preach and administer the system under their own accounts as much as possible.
Getting this far can be a struggle but even if you’re only some of the way there, you can still proceed.
The next step is consider whether to install Informix with role separation. I’m not going to discuss this at length so I’ll point to the documentation. There are no real gotchas here: it works pretty much as it says on the tin. The key idea is that it separates the DBAs from the people who decide what should be audited and who can see the audit trail. In practice I think total separation is impossible because the people deciding what should be audited need to understand the impact on the system of what they audit and the volume of data this produces. It is certainly possible to slow a system down by auditing every update.
So you’re now ready to switch on auditing? Nearly. If you monitor your system via onstat or have scripts which call onmode, ‘onmode -c [un]block’ being a specific example where care is required, you need to be aware that in all but the latest Informix releases, this includes right up to 12.10.FC5W1, as soon as you switch on auditing your onstat and onmode commands will run more slowly. This can also affect admin API command equivalents and not just the ones which are direct equivalents for onmode. The situation can get quite bad when lots of these commands run at the same time, leading to significant delays in the response from these commands.
Fortunately there are some fixes for this:
- TURNING ON THE AUDITING LEVEL 1 ADDS AN UNNECESSARY DELAY TO ONSTAT AND ONMODE COMMANDS
This has been around for a while and appeared in 11.70.FC7W1. However it is not very effective and only eliminates the delay if the volume of onstat commands being run on your system is low.
- TURNING ON THE AUDITING LEVEL 1 ADDS AN UNNECESSARY DELAY TO ONSTAT & ONMODE COMMANDS
This is completely effective and means that onstat and onmode behave identically to when auditing is switched off but it only works if you do not have any audit masks which log the use of these commands.
There are workarounds for the auditing delay such as using sysmaster equivalents for the onstat commands and performing onstat commands inside an onstat -i interactive session.
Finally you’ll want to consider setting up some audit masks. I take the following approach to this:
- _require mask
- This mask defines the minimum events to be audited for all users. I put everything that’s common in here.
- _default mask
- If an account is not assigned a specific mask, it will pick up all the events in there. To avoid having to assign masks to all real users, I don’t assign them any mask and then they automatically inherit this one (in addition to what is in the _require mask).
- Other masks
- For my applications and other accounts needing special treatment, I create a custom mask and assign it to the user.
Finally if you’re feeling brave switch auditing on with some commands like:
onaudit -p /path/to/audit/trail
onaudit -s 1048576 # 1 Mb files
onaudit -e 0
onaudit -l 1
Now there is just that application security model for you to tackle.
Good luck and may you sail through your audits!
Once in a while something comes along to make a DBA’s life easier.
The eagle-eyed amongst you will immediately spot the new line:
Maximum number of pages per index fragment: 2,147,483,647
This is a 128 times improvement on the previous limit of 16,775,134, the same as the data pages per fragment limit, which limited an index fragment to just shy of 256 Gb in a 16 kb dbspace. The new limit of 32 Tb (again with a 16 kb dbspace) is much easier to work with. It applies only to detached indices.
Maybe it’s not the sexiest improvement but it actually arrived at the same time as storage pools in version 11.70 and so has been with us for a while. In my own test I was able to build an unfragmented detached index of over 32 Gb in a 2 kb dbspace in 11.70.FC7.
Should you rely (in version 11.70) on undocumented functionality? Without your own testing, maybe not. However, it’s good that this extra breathing space for DBAs exists and – with 12.10 – is documented and fully supported.
Is it possible to have too much monitoring of an Informix instance? I would say no, especially if your monitoring is lightweight and efficient. Being able to pinpoint an issue quickly in a disaster means you can reduce the time it takes to fix it, save your company’s money or reputation and be a hero at the same time.
I thought it would be worthwhile to do a general blog post on monitoring, partly to see if anyone will post suggestions for things I’ve missed via the comments section. I’m not going to focus too much on the exact mechanisms, although I will give a few hints here and there. Suffice to say you can do a lot with the various onstats, some simple queries on the system tables and some bash, perl, awk and greping to bind it all together. There are also powerful commercial monitoring tools, most notably AGS Server Studio, although personally I find there isn’t a single tool that fulfils all my monitoring needs.
Let’s start with the simplest monitor of all: is your instance up? One effective way of monitoring this is via the return code from onstat –. If it’s 5 all is well. But perhaps there’s a problem with the database listener which this won’t detect so maybe you should also check that you can connect to your database via TCP/IP and run a simple (and quick) query.
Once you’ve established that your instance is online, you can check for other problems such as:
- An excessive number of sessions waiting on locks, a logical log buffer or buffers via onstat -u.
- The number of locks in use. I would avoid querying syslocks or running onstat -k because this can become inefficient when there’s a large number of locks. Using onstat -u instead is usually good enough and doesn’t run into scalability difficulties.
- Locks with waiters.
- Logical logs not backed up.
- Backup failures or storage snapshot issues. It’s good to monitor this independently rather than rely on the success or failure of your backup script.
- Replication status for HDR and RSS servers.
- SQL error conditions encountered by your applications.
- Long transactions and how many logical logs are needed to support a particular transaction.
- The number of memory segments and free memory reported by onstat -g seg.
- Memory used by CPU VP caches. I have a separate blog post on this.
- Error messages in the online log. I have a separate blog post on this topic too.
- Error messages in the console log.
- Number of ready threads, indicating a lack of available CPU VPs.
- Dbspaces running out of unallocated space.
- Any chunks that are down.
- Excessively long checkpoints.
- Out of date statistics or distributions.
- Query plan changes and slow-running queries (this one is tricky).
It’s worth being aware of the various system limits you can run up against on larger systems:
- Tables and indices reaching the maximum number of extents, particularly in dbspaces with a 2 kb page size.
- Tables approaching the maximum number of data pages of 16,777,216. In not so recent versions it’s also necessary to check indices for the same limit but now the limit for detached indices is much larger (certainly it is in 11.70.FC7W1).
- Auto-numbering fields running out of bits, particularly serials. (This may not be a problem for you if you don’t mind the next serial number wrapping around.)
It’s essential to work with your sys admins and network technicians who should have a good idea of what the operating system is doing. Often a high load average can be the first sign of a problem. They may be keeping a record of system performance that allows you to look back in time to when things were working better.
Lastly, a lot cane be discovered about your system by periodically reviewing performance data for the number of locks in use, sequential scans and so on. This is a different type of monitoring to what I’ve covered mostly here, which is about detecting issues when they occur, but sometimes problems can only be spotted by analysing the general system performance over time.
I think this is a pretty good list. Some of them are complete topics in themselves and worthy of separate blog posts.
There may be other things that you can do specific to your set up. I have a few monitors for specific bugs and the status of their workarounds, some of which are no longer needed. You may also choose to monitor the health of your application. For example, if there are message queues of some sort, are they being flushed quickly enough? Is your application regularly inserting data into its journal or log file as you expect?
My goal is to be able to detect any scenario it’s possible to predict, especially where we’ve had problems in the past. This implies that my monitoring needs to evolve and be improved over time.
Good and comprehensive monitoring should make your life as a DBA easier. It allows you to point others to your monitoring tools and gives others confidence in them if they have doubts about whether the database is working properly. It saves you having to dig around, running manual health checks or similar.
So then, anything I’ve missed?