Monitoring Informix with GrafanaPosted: 3 January, 2018 | |
In a presentation I gave at IIUG 2017 titled Making system monitoring better I showed, without much detail, how Grafana is a powerful tool for visualising what is happening within your Informix server. Performance metrics from your database server are collected at regular (usually 10 second) intervals and stored in a time-series database which can be used as the source for dashboards containing dynamic graphs and other ways of presenting the data. For the DBA the benefits are legion:
- Quickly change the time range to zoom into when problems occurred or zoom out to see trends.
- Correlate various database metrics and combine then with related operating system, network, storage or application metrics.
- Get a truer picture of your busy periods, capacity, the effect of scheduled jobs etc.
- Faster problem resolution because a lot of data can be visualised at once.
You might be able to think of some others.
The talk also touched on the CAMS acronym:
So you shouldn’t keep your dashboards to yourself: share them with other technical teams or everyone in your company. This has the added benefit of more eyes and others can learn to spot database problems, or when they are probably not database problems, by referring to these.
There are a number of tools which appear to do a similar job:
You perhaps haven’t heard of Brunia: it is the code name for a prototype monitoring tool that may replace Informix Open Admin Tool (OAT) in the future. It was demonstrated at IIUG 2017 and is probably closest to Prometheus in its execution. AGS Sentinel is the monitoring add-on to the popular ServerStudio suite for Informix. The rest are popular open source tools which other teams in your organisation are probably already using.
Some of the tools listed above can also produce events or alerts when a trigger condition occurs and automatically pass this up a stack to PagerDuty or another call-out system. An example of such a stack is Prometheus -> Alertmanager -> PagerDuty -> StatusPage
There are a lot of ways of implementing a full monitoring stack with choices to make about data collection, storing, visualisation, analysis and alerting. In this blog post I am going to concentrate on a simple set up where we collect Informix metrics, pass them to InfluxDB using a REST API and visualise in Grafana. For a fuller discussion of the benefits of the three open source technologies mentioned above I highly recommend reading this blog post from Loom Systems written in June 2017, Prometheus vs. Grafana vs. Graphite – A Feature Comparison.
In case you’re not going to read the LS blog it’s worth emphasising what the InfluxDB/Grafana solution I am about to describe does not provide:
- There is little in the way of monitoring and alerting features.
- Regression models enabling you to predict the value of a metric in the future are not available.
- Advanced time series functions are not available.
The solution would be richer if Graphite was used as the data source and Grafana for visualisation only. This would provide more aggregation functions and allows you to do things like subtract one time series from another. As an example of what this might provide, I have a dashboard (not covered in this blog post) displaying the buffer turnover ratio and buffer waits ratio over an arbitrary moving window irrespective of when onstat -z was last run.
It is easy to confuse Graphite and Grafana, especially as both can be used independently, or Graphite can be a data source for Grafana.
As this is an Informix blog I ought to explain why I am using InfluxDB and not Informix time series? The simple answer is that to use Informix time series with Grafana properly someone would have to write and maintain a data source plugin for it like the one for InfluxDB we’ll be using. Doing so would give something more feature rich than InfluxDB for sure but perhaps not much more powerful than a Graphite/Grafana combination.
What to monitor
Potentially anything we can put a value to every ten seconds can be collected and stored in InfluxDB (which is a statement you can make about time series collections in general). For Linux operating system metrics there is a well-established collection daemon called collectd and, if I had better C programming skills, I could a collectd plugin for Informix.
For Informix systems the most obvious source is the system monitoring interface (SMI) which is the presentation of information held in shared memory through pseudo-tables in the sysmaster database. This covers the vast majority of what can be collected using onstat but is easier to handle in a programming language. Doing it this way means we can also collect real table data in the same manner.
For example the system profile or onstat -p can be captured with the following SQL:
TRIM(name) AS name,
name IN ('dskreads', 'bufreads', 'dskwrites', 'bufwrites', 'isamtot', 'isopens', 'isstarts', 'isreads', 'iswrites', 'isrewrites', 'isdeletes', 'iscommits', 'isrollbacks', 'latchwts', 'buffwts', 'lockreqs', 'lockwts', 'ckptwts', 'deadlks', 'lktouts', 'numckpts', 'plgpagewrites', 'plgwrites', 'llgrecs', 'llgpagewrites', 'llgwrites', 'pagreads', 'pagwrites', 'flushes', 'compress', 'fgwrites', 'lruwrites', 'chunkwrites', 'btraidx', 'dpra', 'rapgs_used', 'seqscans', 'totalsorts', 'memsorts', 'disksorts', 'maxsortspace')
It’s important to understand that all of these metrics are what I’d term counters. That is they only increase over time (unless they get so large they run out of bits and wrap or a DBA runs onstat -z). It gets difficult to see on a graph the difference between, say, 2394472 and 2394483 and so it’s useful to calculate a delta over the ten second window. Some things you might collect are automatically suitable for graphing because they are gauges: an example of this is the number of threads in your ready queue at any given moment.
Practical demonstration with Docker containers
Nothing better than an example you can try at home (or work!). In the implementation example I will be using the IBM Informix Developer Edition Docker container which, at time of writing, runs Debian 8 (jeesie) and a second Docker container for InfluxDB and Grafana. You’ll of course need Docker installed on your laptop or workstation for this to work.
What this demonstration is going to build will look like the above. A collector script will collect metrics from Informix at a regular interval and post the results to InfluxDB. You will be able to use your usual web browser to connect to Grafana and visualise the data. Sounds simple?
We’ll start by setting up the InfluxDB/Grafana Docker container which will be also be using on a (minimal) Debian installation. In a terminal run:
docker pull debian
docker run -it --name influx_grafana_monitoring -p 8086:8086 -p 3000:3000 --hostname grafserv -e "GF_SECURITY_ADMIN_PASSWORD=secret" debian
Your terminal should now be inside the Docker container and logged in as root. Run these commands to install some extra packages and then InfluxDB:
apt-get -y install curl gnupg apt-transport-https procps
curl -sL https://repos.influxdata.com/influxdb.key | apt-key add -
echo "deb https://repos.influxdata.com/debian jessie stable" | tee -a /etc/apt/sources.list
apt-get -y install influxdb
Next install Grafana in the container:
Start the both services inside the container:
We need to create an Influx database to store our time series data and we can do this with a REST API call:
curl -i -XPOST http://localhost:8086/query --data-urlencode "q=CREATE DATABASE informix"
If it works you should see a HTTP/1.1 200 OK response.
You should now be able to access the Grafana server running in your Docker container from your usual web browser at http://localhost:3000/
Log in with the user name admin and the password secret. Once logged in click Add data source and fill in the settings as follows (some of them are case-sensitive):
- Basic auth
- Leave unticked
- With credentials
- Leave unticked
- Leave blank
- Leave blank
- Min time interval
- Leave at 10s
All being well you should see Data source is working in a big green box.
Now we are going to set up the Informix container to monitor. On your workstation in another terminal run:
$ docker pull ibmcom/informix-developer-database
docker run -it --name iif_developer_edition --privileged -p 9088:9088 -p 9089:9089 -p 27017:27017 -p 27018:27018 -p 27883:27883 --hostname ifxserver -e LICENSE=accept ibmcom/informix-developer-database:latest
The command above should provide a running Informix instance which may take a few moments after which control is passed back to the terminal. We are now going to get the scripts that will send data to InfluxDB:
apt-get -y install git libdbi-perl libjson-perl libwww-curl-perl make gcc libtest-pod-perl
We need to get the Perl DBD::Informix package from CPAN which will download, compile, test and install it for us.
Enter ‘yes’ to configure as much as possible. In the CPAN console type the case-sensitive command:
There is quite a lot that could go wrong in the CPAN console but it should work if you’re using the IBM Informix DE Docker container and follow the steps exactly. If you’re installing on RedHat Linux or a derivative the required RPM package names you use with yum install will all be different.
Type logout to exit the root shell. You should be logged in as user informix again. Leave this session for later.
Run on your local workstation (outside both Docker containers) in another terminal window:
This assumes you have git installed. There are two important files in the post_to_influxdb directory:
You will need to edit informix_smi.json and change all references from mydatabase to the name of the user/application database you want to monitor. For the purposes of the blog post in this demo, we are just going to monitor the sysmaster database so change all mydatabase references to sysmaster.
You can copy the files to your Informix Docker container as follows. To get the name of your Informix Docker container (not its hostname) simply type docker ps on your workstation.
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
5a9c73712429 debian "bash" 8 minutes ago Up 8 minutes 0.0.0.0:3000->3000/tcp, 0.0.0.0:8086->8086/tcp influx_grafana_monitoring
e1c178b57164 ibmcom/informix-developer-database:latest "/bin/bash informi..." 13 minutes ago Up 13 minutes 0.0.0.0:9088-9089->9088-9089/tcp, 0.0.0.0:27017-27018->27017-27018/tcp, 0.0.0.0:27883->27883/tcp iif_developer_edition
From the above my Informix container name is e1c178b57164
docker cp informix_smi_influx_uploader e1c178b57164:/home/informix
docker cp informix_smi.json e1c178b57164:/home/informix
We should be ready to start collecting metrics and posting them to InfluxDB. Run in the Informix container as user informix:
./informix_smi_influx_uploader -c ./informix_smi.json -i 10 -u http://other_container_ip:8086/write?db=informix
Change other_container_ip to the IP address of your InfluxDB/Grafana container. You must use the IP address unless you have name resolution which this basic Docker set up does not. If you don’t know what this is you can ping the docker container name from inside the InfluxDB/Grafana container using a command like ping -c 1 grafserv
All being well the Perl script should run continuously and collect and post data to InfluxDB every 10 seconds or whatever interval you specified with the -i switch.
To see anything in Grafana we’ll need to set up a dashboard. The file informix_server.json in the grafana_dashboard directory describes a Grafana dashboard. You’ll need to edit it a bit first and change all occurrences of the following:
- <%= @hostname %>
- hostname of your Informix docker container, normally the unqualified hostname of your Informix server
- <%= @informixserver %>
- Name of your Informix instance, dev
In the Grafana web browser click the Grafana logo and then Dashboards. Click Import. At Import Dashboard click Upload .json File. Hey presto, you should have a dashboard with graphs. Some may not display any data, e.g. Temporary dbspace usage, because there are no temporary dbspaces in the development Docker image by default.
Making this ready for production
There are a few bits I’ll leave to you for any production implementation:
- The collection shouldn’t run as user informix. Create a user for the monitoring and give it just the CONNECT and SELECT privileges it needs.
- You’ll also need to write a script to start/stop the collection with the instance.
- Linux operating system statistics, gathered through collectd would complement the dashboard very nicely.
- You’ll probably want to customise both the JSON file containing the SMI queries and the one describing the dashboard. You could add application metrics than can be collected from the database or create multiple dashboards especially if you don’t like the idea of one big one showing everything. Bear in mind any queries you write need to be fast and will run every 10 seconds.
- In the dashboard you may need to add/remove different page sizes to/from the buffer, page and disk reads/writes graphs.
It can be very useful to track virtual segment and temp. space usage on your server and correlate with events like update stats, ontape/onbar backups or application activity. You can use other tools to do this but these often are not as accessible or are purely in the realm of the DBA. A Grafana dashboard like the one described here should be very useful for you and colleagues, especially if they have their own dashboards on the same system which allow you to view your systems as a whole, and it might go some distance to demystifying Informix in your organisation.