Parsing the online log

At a recent IBM roadshow there was a brief discussion where someone mentioned that they monitor their instances using a script and regular expressions to parse the online log. Using such an approach is quite straightforward, although it is somewhat tedious to code for all the possible combinations.

You can actually find out most, if not all of the possible messages, by running:

strings $INFORMIXDIR/msg/en_us/0333/olmsglog.iem

However, there is an easier way. Nearly always when a message is written to the online log, the alarmprogram is called. If you’re familiar with the alarmprogram, you’ll know that when it’s called a severity value is passed through which you can use as the basis for if you’re alerted by email or not. This severity value is never shown in the online log so already you can see that other approaches might have more potential.

There is now a scheduler task called post_alarm_message, which writes online log messages to a table called ph_alert in the sysadmin database.

> dbaccess sysadmin -

Database selected.

> select tk_name, tk_description, tk_execute, tk_enable from ph_task where tk_name='post_alarm_message';

tk_name post_alarm_message
tk_description System function to post alerts
tk_execute ph_dbs_alert
tk_enable t

1 row(s) retrieved.

I guess this is there primarily for OAT but it’s extremely useful for system monitoring because it preserves some information about the severity of the alerts and also makes them easy to query via SQL.

> select * from ph_alert where alert_time > current - 30 units minute;

id 9349
alert_task_id 18
alert_task_seq 4970
alert_type INFO
alert_color YELLOW
alert_time 2013-10-02 19:30:06
alert_state NEW
alert_state_chang+ 2013-10-02 19:30:06
alert_object_type ALARM
alert_object_name 23
alert_message Logical Log 14294 Complete, timestamp: 0x7ab0345c.
alert_action_dbs sysadmin
alert_action
alert_object_info 23001

id 9350
alert_task_id 18
alert_task_seq 4971
alert_type INFO
alert_color YELLOW
alert_time 2013-10-02 19:49:25
alert_state NEW
alert_state_chang+ 2013-10-02 19:49:25
alert_object_type ALARM
alert_object_name 23
alert_message Logical Log 14295 Complete, timestamp: 0x7ad5b988.
alert_action_dbs sysadmin
alert_action
alert_object_info 23001

2 row(s) retrieved.

Note the alarm types and colours. Personally I don’t set much store by the alert colour; I prefer to go by the alarm type where the possibilities are INFO, WARNING and ERROR. These are not the same as the alarmprogram severities which range from 1 to 5.

For monitoring purposes, a sensible query to look for alerts might be something like:

select alert_time, alert_color, alert_type, alert_object_type, alert_message from ph_alert where alert_type!='INFO' and alert_state='NEW' and alert_time > current - 7 units day order by alert_time;

Using dbaccess or an Informix API for your favourite scripting language, you can monitor for alerts quite easily.

You can mark the alerts as acknowledged using a query like the below:

update ph_alert set alert_state='ACKNOWLEDGED' where id=? and alert_state='NEW' and alert_type in ('WARNING', 'ERROR');

Hopefully I’ve demonstrated that this approach is a lot easier and probably better than a complex regex script. Maybe you shouldn’t throw away that script just yet and run both in parallel until you’re satisfied of the reliability of this approach? To make sure that the post_alarm_message process is working I also check that there are entries in the ph_alert table on a regular basis, although on a quiet system there may not be any entries for some time.

There’s also the console log to consider, which you may want to monitor and it cannot be implemented in this way.

Advertisements


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s