Friday, January 24, 2014

Monitoring JBoss Fuse ESB with Nagios

Note: this article describe a scenario based on JBoss Fuse, but it's applicable to any Java context able to run Servlet java, like JBoss EAP, WildFly, Tomcat, etc...

One of my recent activity at work has been to provide guidance about monitoring a JBoss Fuse ESB setup with Nagios/OpsView. Despite more specialized solutions for the specific problem do exist (Fuse plugin for Red Hat JON), Nagios is still one of the most diffuse opensource monitoring tool.

You don't need to an expert in Nagios to understand this article, I am definitely not. But if you are and you have any suggestion to improve this solution, let me know please.

Nagios is an open source monitoring tool that, with the help of plugins, is able to collect many metrics from different kind of services and to notify you when a specific value or a specific pattern ( values over time ) is identified. It can be used to monitor from the operating system status to you custom deployed application more obscure values, assuming that you give specify what is important for you.

In our example, our custom application is deployed on JBoss Fuse ESB.

Most of the metrics that we want to monitor are related to Apache Camel, Apache ActiveMQ and Apache CXF These projects already do a excellent job in exposing many runtime information that we are interested into. For example Camel tells us how many messages passed through a specific component or how what's the status of some of our routes.

The technology that these projects use to expose so many valuable information is JMX.

Nagios supports JMX with the help of external plugins.

We explored the following list:

check_jmx

We have found some problem with this approach:

1) to allow RMI communication, the network layer needs to allow the connection to specific ports.
2) the plugin supports only attributes and not operations
3) building JMX queries is not particularly user friendly, specifically if you are not a java developer/devop

Since we had the need to invoke some operation as part of our monitoring requirements, we were forced to look for other alternatives.

check_http with Jolokia

One of our first alternatives idea was to use Jolokia.

Jolokia is a java library that exposes JMX interfaces over HTTP, with a JSON based REST api.

To do its magic over http it just needs an http entrypoint to be invoked, that is a Servlet. I leave you with Jolokia official instruction to install it, but once you have those components in place, you are ready to use it's JMX bridging features.

But I also share with you a small trick:
I haven't manually installed Jolokia. Since we are already using the awesome hawt.io as a management console, and since it's leverages Jolokia, everything that we needed was already there.

Let's explore the benefits of using a jolokia based solution.

Being http based, it clearly helps with the network configuration problems:

I still find it somehow hard to accept, but in my experience with many customers, the handling of the corporate network configuration is often more complicated than expected. Something that seems simple on an abstract paper diagram stop being so when you don't have details about the network topology and the only thing that you can see is that the end to end communication doesn't work. For this reason, depending on the popular http protocol is definitely an attractive feature.

The second added benefit is that, differently than check_jmx, it supports JMX operation invocation. This last feature turns to be handy if the metrics you are interested into are not exposed as attributes but only as operations. One example is the operation:

osgi.core:type=bundleState,version=1.5/getState

For what concerns the ergonomy of the interface, I personally believe that it offers a very straight forward feeling.

Requests can be simple. You could end up invoking a very tidy REST endpoint via GET, something similar to this:

curl -u admin:admin http://172.17.42.1:8012/jolokia/read/java.lang:type=Memory/HeapMemoryUsage

But the moment you are starting to send complex input payload or when you can rely on POST and external input files containing you json payloads. I suggest you to check Jakub Korab helpful post: http://www.jakubkorab.net/2013/11/monitoring-activemq-via-http.html

To use Jolokia directly from Nagios we can use the common check_http plugin that mimics curl behavior just like in the previous example. The only glitch with this is that check_http doesn't offer a behavior to process json strings, that are the structure that jolokia is returning. You could probably be able to parse the output with regular expressions and simple value checking but we feel that we are missing something. And what is missing here is instead offered by the next option.

check_jmx4perl with Jolokia

jmx4perl is a set of Perl libraries and scripts that allow to communicate with jolokia exposed JMX objects. One of the tools bundled with the project is a Nagios plugin: check_jmx4perl

Don't be scared by the "perl" keyword. I don't write perl and I have problems reading it. And still I can use the tool. The project gives you executables scripts that you can invoke from command line to query JMX services exposed by Jolokia and it also provides Nagios compatible executable.

With this tool you can write queries like this one:

$ check_jmx4perl \
    --user=admin \
    --password=admin \
    --url http://10.21.21.1:8012/jolokia \
    --name "[MyService - CamelContext - WebService]" \
    --mbean "org.apache.camel:context=mycontext/86-MyRoute.Request,name=\"log\",type=components" \
    --attribute "State" \
    --critical Stopped \
    --warning   !Started

OK - MyService - CamelContext - WebService] : 'Started' as expected | 'MyService - CamelContext - WebService]'=Started;!Started;Stopped

And as you can guess reading the previous command the Nagios support is very immediate, allowing you to specify the values you want to identify as representing a Warning status or an Error one.

If you are familiar with Nagios you know that to use an executable you have to define it in Nagios configuration.

This is some example of possible macros:

### check_jmx4 supports wildcards! ( you can use asterisk everywhere in the string names )


# Read JMX attributes without support for nested attributes 
define command {
     command_name         check_jmx4perl_attribute_absolute
     command_line         /usr/local/bin/check_jmx4perl \
                              $ARG1$ \
                              --url $ARG2$ \
                              --mbean $ARG3$ \
                              --attribute $ARG4$ \
                              $ARG5$
  }

# Check Bundle is Active
define command {
     command_name         check_jmx4perl_bundle_is_active
     command_line         /usr/local/bin/check_jmx4perl \
                              $ARG1$ \
                              --url $ARG2$ \
                              --warning \!ACTIVE \
                              --critical \!ACTIVE \
                              --mbean "osgi.core:type=bundleState,version=1.5" \
                              --operation "getState(long)" \
                              $ARG3$
  }

Once you have defined those macros in Nagios, you can define your real monitoring calls that use those commands. Something like:

# Root service definition that presets some values and variables
define service {
    use generic-service
    name jolokia
    register 0
    host_name localhost
    _agenturl http://172.17.42.1:8012/jolokia
    _authentication --user=admin --password=admin
    }

# Sample Bundle is Active
define service {
     service_description    Sample Bundle is Active
     use                    jolokia
     check_command          check_jmx4perl_bundle_is_active\
                            !$_SERVICEAUTHENTICATION$ \
                            !$_SERVICEAGENTURL$ \
                            !74 
    }

How to test this?

Despite installing and configuring Nagios is not rocket science, it's not always a straight forward activity.
Sometimes you make silly typos or just leave a space in the wrong place and nothing is working. Despite having a feeling that you can fix it with just some time it turns to be a time stealing activity that distracts you from your huge list of other things to do.

Or maybe you are just like me: the fact that you managed to set up everything in a couple of days doesn't mean that you will be able to precisely remember how, if asked in month.

For all those reasons I have decided to have some fun with Docker.

Docker is a cool and new tool that you can use to provide bundled stacks of applications, called containers; they can be preconfigured exactly as you want. I have put together a Docker container that starts for you a Nagios instance and provides all the plugins, scripts and sample configuration that I have discussed in this post.

In case you are not interest in Docker you can still find sample at this GitHub repository and eventually still read the Docker file that at the end gives all the step you need to install and configure Nagios with jmx4perl.

https://github.com/paoloantinori/docker_centos_nagios

Since I've built my knowledge on the information already available on the web, this is a small list of the resources that helped me to put together this tutorial:

http://www.jakubkorab.net/2013/11/monitoring-activemq-via-http.html
http://search.cpan.org/~roland/jmx4perl-1.07/scripts/check_jmx4perl#Parameterized_checks
http://labs.consol.de/lang/en/blog/jmx4perl/check_jmx4perl-einfache-servicedefinitionen/

In case you have any other interesting approach to the problem please leave a comment.