Monday, October 19, 2015

Debugging tip: How to simulate a slow hardisk

As a Software Engineer there are times when you’d like to have a slower system.

It doesn’t happen really often actually: usually it’s when someone reports a bug on your software that you have never seen before and that you can not reproduce.

The majority of time, the reason of those ghost bugs are race conditions.

Race conditions, are issues you might face with multithreaded programming. Imagine that your software does multiple things at the same time. Despite most of the time those things happen in the expected and intuitive order, sometimes, they don’t; leading to unexpected state of your program.

They are indeed a bug. It’s dev’s fault. But the dev doesn’t have much way to protect himself from them. There are programming styles and technologies that push you to avoid the risk altogether, but I think that in general, they are a condition that each developer has to be familiar with.

So why a slower system helps?

Because most of the times, race conditions are masked by the fact the operations still happen “reasonably quickly”. This ambiguos “reasonably quickly” is the main issue. There is no clear limit or number that tells you how quickly. You just have higher chances to see them if things are slow enough to show they are not happening in the correct order or they are not waiting for the correct checkpoints.

In my experience with java applications, the main performance related aspect, while reproducing race condition is disk access speed. More thatn cpu speed or the amount of RAM, I have noticed that disk speed is the biggest differentiation between similar systems.

In this post I will show how to simulate a slow hardisk on Linux to increase your chances to reproduce race conditions.

The solution will be based on nbd and trickle and it will use the network layer to regulate the i/o throughput for your virtual hardisk.

I’d like to start adding that this isn’t anything new and that I’m not suggesting any particularly revolutionary approach. There are many blogpost out there that describe how to achieve this. But for multiple reasons, none of those that I have read worked out of the box on my Fedora 22 or Centos 6 installations.
That is the main reason that pushed me to give back to the internet, adding what might be, just another page on the argument.

Let’s start with the idea of using nbd or Network Block Device to simulate our hardisk.

As far as I understand there aren’t official ways, exposed by the linux Kernel to regulate the I/O speeds of generic block devices.

Over the internet you may find many suggestions, spanning from disabling read and write cache to geneate real load that could make your system busy.

QoS can be enforced on the network layer though. So the idea is to emulate a block device via network.

Despite this might sound very complicated (and maybe it is), the problem has been already been solved by the Linux Kernel with the nbd module.

Since on my Fedora 22 that module is not enabled automatically by default, we have to install it first, and then enable it:

# install nbd module
sudo yum install nbd

# load nbd module
sudo modprobe nbd

# check nbd module is really loaded
lsmod | grep nbd
nbd                    20480  0

Now that nbd is installed and the module loaded we create a configuration file for its daemon:

# run this command as root
"cat > /etc/nbd-server/config" <<EOF
    exportname = /home/pantinor/test_nbd
    copyonwrite = false

Where exportname is a path to a file that will represent your slow virtual hardisk.

You can create the file with this command:

# create an empty file, and reserve it 1GB of space
dd if=/dev/zero of=/home/pantinor/test_nbd bs=1G count=1

Now that config and the destination files are in place, you can start the nbd-server using daemon:

# start ndb-server daemon
sudo systemctl start nbd-server.service

# monitor the daaemon start up with:
journalctl -f --unit nbd-server.service

At this point you have a server network process, listening on port 10809 that any client over your network can connect to , to mount it as a network block device.

We can mount it with this this command:

# "test" corresponds to the configuration section in daemon  config file
sudo nbd-client -N test 10809  /dev/nbd0
# my Centos 6 version of nbd-client needs a slightly different synatx:
#    sudo nbd-client -N test   /dev/nbd0

Now we have created a virtual block device, called /dev/nbd0. Now we can format it like it was a normal one:

# format device
sudo mkfs /dev/nbd0 

# create folder for mounting
sudo mkdir /mnt/nbd

# mount device, sync option is important to not allow the kernel to cheat!
sudo mount -o sync /dev/nbd0 /mnt/nbd

# add write permissions to everyone
sudo chmod a+rwx /mnt/nbd

Not that we have passed to mount command the flag -o sync. This command has an important function: to disable an enhancement in the linux Kernel that delays the completion of write operations to the devices. Without that all the write operations will look like instantaneous, and the kernel will actually complete the write requests in background. With this flag instead, all the operation will wait until the operation has really completed.

You can check that now you are able to read and write on the mount point /mnt/nbd.

Let’s now temporarily unmount and disconnect from nbd-server:

sudo umount /mnt/nbd

sudo nbd-client -d /dev/nbd0

And let’s introduce trickle.

Trickle is a software you can use to wrap other processes and to limit their networking bandwidth.

You can use it to limit any other program. A simple test you can perform with it is to use it with curl:

# download a sample file and limits download speed to 50 KB/s
trickle -d 50 -u 50  curl -O

Now, as you can expect, we just need to join trickle and nbd-server behavior, to obtain the desired behavior.

Let’s start stopping current nbd-server daemon to free up its default port:

sudo systemctl stop nbd-server.service

And let’s start it via trickle:

# start nbd-server limiting its network throughput
trickle -d 20 -u 20 -v nbd-server -d

-d attaches the server process to the console, so the console will be blocked and it will be freed only one you close the process or when a client disconnects.
Ignore the error message: trickle: Could not reach trickled, working independently: No such file or directory

Now you can re-issue the commands to connect to nbd-server and re mount it:

sudo nbd-client -N test 10809  /dev/nbd0

sudo mount -o sync /dev/nbd0 /mnt/nbd

And you are done! Now you have a slow hardisk mounted on /dev/nbd0.

You can verify the slow behavior in this way:

sudo dd if=/dev/nbd0 of=/dev/null bs=65536 skip=100 count=10
10+0 records in
10+0 records out
655360 bytes (655 kB) copied, 18.8038 s, 34.9 kB/s

# when run against an nbd-server that doesn't use trickle the output is:
# 655360 bytes (655 kB) copied, 0.000723881 s, 905 MB/s

Now that you have a slow partition, you can just put the files of your sw there to simulate a slow i/o.

All the above steps can be converted to helper scripts that will make the process much simpler like those described here:

Monday, October 5, 2015

JBoss Fuse - Turn your static config into dynamic templates with MVEL

Recently I have rediscovered a JBoss Fuse functionality that I had forgotten about and I’ve thought that other people out there may benefit of this reminder.

This post will be focused on JBoss Fuse and Fabric8 but it might interest also all those developers that are looking for minimally invasive ways to add some degree of dynamic support to their static configuration files.

The idea of dynamic configuration in OSGi and in Fabric8

OSGi framework is more often remembered for its class-loading behavior. But a part of that, it also defines other concepts and functionality that the framework has to implement.
One of the is ConfigAdmin.

ConfigAdmin is a service to define an externalized set of properties files that are logically bounded to your deployment units.

The lifecycle of this external properties files is linked with OSGi bundle lifecycle: if you modify an external property file, your bundle will be notified.
Depending on how you coded your bundle you can decide to react to the notification and, programmatically or via different helper frameworks like Blueprint, you can invoke code that uses the new configuration.

This mechanism is handy and powerful, and all developers using OSGi are familiar with it.

Fabric8 builds on the idea of ConfigAdmin, and extends it.

With its provisioning capabilities, Fabric8 defines the concept of a Profile that encapsulates deployment units and configuration. It adds some layer of functionality on top of plain OSGi and it allows to manage any kind of deployment unit, not only OSGi bundles, as well as any kind of configuration or static file.

If you check the official documentation you will find the list of “extensions” that Fabric8 layer offers and you will learn that they are divided mainly in 2 groups: Url Handlers and Property Resolvers.

I suggest everyone that is interested in this technology to dig through the documentation; but to offer a brief summary and a short example, imagine that your Fabric profiles have the capability to resolve some values at runtime using specific placeholders.


# sample url handler usage, ResourceName is a filename relative to the namespace of the containing Profile:

# sample property handler, the value is read at deploy time, from the Apache Zookeeper distributed registry that is published when you run JBoss Fuse

There are multiple handlers available out of the box, covering what the developers thought were the most common use cases: Zookeeper, Profiles, Blueprint, Spring, System Properties, Managed Ports, etc.

And you might also think to extend the mechanism defining your own extension: for example you might want to react to performance metrics you are storing on some system, you can write an extension, with it’s syntax convention, that injects values taken from your system.

The limit of all this power: static configuration files

The capabilities I have introduced above are exciting and powerful but they have an implicit limit: they are available only to .properties files or to files that Fabric is aware of.

This means that those functionality are available if you have to manage Fabric Profiles, OSGi properties or other specific technology that interact with them like Camel, but they are not enabled for anything that is Fabric-Unaware.

Imagine you have your custom code that reads an .xml configuration file. And imagine that your code doesn’t reference any Fabric object or service.
Your code will process that .xml file as-is. There won’t be any magic replacement of tokens or paths, because despite you are running inside Fabric, you are NOT using any directly supported technology and you are NOT notifying Fabric, that you might want its services.

To solve this problem you have 3 options:

  1. You write an extension to Fabric to handle and recognise your static resources and delegates the dynamic replacement to the framework code.
  2. You alter the code contained in your deployment unit, and instead of consuming directly the static resources you ask to Fabric services to interpolate them for you
  3. *You use mvel: url handler (and avoid touching any other code!)

What is MVEL ?

MVEL is actually a programming language: .
In particular it’s also scripting language that you can run directly from source skipping the compilation step.
It actually has multiple specific characteristics that might make it interesting to be embedded within another application and be used to define new behaviors at runtime. For all these reasons, for example, it’s also one of the supported languages for JBoss Drools project, that works with Business Rules you might want to define or modify at runtime.

Why it can be useful to us? Mainly for 2 reasons:

  1. it works well as a templating language
  2. Fabric8 has already a mvel: url handler that implicitly, acts also as a resource handler!

Templating language

Templating languages are those family of languages (often Domain Specific Languages) where you can altern static portion of text that is read as-is and dynamic instructions that will be processed at parsing time. I’m probably saying in a more complicated way the same idea I have already introduced above: you can have tokens in your text that will be translated following a specific convention.

This sounds exactly like the capabilities provided by the handlers we have introduced above. With an important difference: while those were context specific handler, MVEL is a general purpose technology. So don’t expect it to know anything about Zookeeper or Fabric profiles, but expect it to be able to support generic programming language concepts like loops, code invocation, reflection and so on.

Fabric supports it!

A reference to the support in Fabric can be find here:

But let me add a snippet of the original code that implements the functionality, since this is the part where you might found this approach interesting even outside the context of JBoss Fuse:

public InputStream getInputStream() throws IOException {
  String path = url.getPath();
  URL url = new URL(path);
  CompiledTemplate compiledTemplate = TemplateCompiler.compileTemplate(url.openStream());
  Map<String, Object> data = new HashMap<String, Object>();
  Profile overlayProfile = fabricService.get().getCurrentContainer().getOverlayProfile();
  data.put(“profile”, Profiles.getEffectiveProfile(fabricService.get(), overlayProfile));
  data.put(“runtime”, runtimeProperties.get());
  String content = TemplateRuntime.execute(compiledTemplate, data).toString();
  return new ByteArrayInputStream(content.getBytes());

What’s happening here?

First, since it’s not showed in the snippet, remember that this is a url handler. This means that the behavior get’s triggered for files that are referred to via a specific uri. In this case it’s mvel:. For example a valid path might be mvel:jetty.xml.

The other interesting and relatively simple thing to notice is the interaction with MVEL interpreter.
Like most of the templating technologies, even the simplest ones you can implement yourself you usually have:

  • an engine/complier, here it’s TemplateCompiler
  • a variable that contains your template, here it’s url
  • a variable that represent your context, that is the set of variables you want to expose to the engine, here data

Put them all together, asking the engine to do it’s job, here with TemplateRuntime.execute(...) and what you get in output is a static String. No longer the templating instructions, but all the logic your template was defining has been applied, and eventually, augmented with some of the additional input values taken from the context.

An example

I hope my explanation it’s been simple enough, but probably an example is the best way to express the concept.

Let’s use jetty.xml, contained in JBoss Fuse default.profile, that is a static resource that JBoss Fuse doesn’t handle as any special file, so it doesn’t offer any replacement functionality to it.

I will show both aspects of MVEL integration here: reading some value from the context variables and show how programmatic logic (just the sum of 2 integers here) can be used:

<Property name="jetty.port" default="@{  Integer.valueOf( profile.configurations['org.ops4j.pax.web']['org.osgi.service.http.port'] ) + 10  }"/>

We are modifying the default value for Jetty port, taking its initial value from the “profile” context variable, that is a Fabric aware object that has access to the rest of the configuration:


we explicitly cast it from String to Integer:

Integer.valueOf( ... )

and we add a static value of 10 to the returned value:

.. + 10

Let’s save the file, stop our fuse instance. Restart it and re-create a test Fabric:

# in Fuse CLI shell
shutdown -f

# in bash shell
rm -rf data instances


# in Fuse CLI shell
fabric:create --wait-for-provisioning

Just wait and monitor logs and… Uh-oh. An error! What’s happening?

This is the error:

2015-10-05 12:00:10,005 | ERROR | pool-7-thread-1  | Activator                        | 102 - org.ops4j.pax.web.pax-web-runtime - 3.2.5 | Unable to start pax web server: Exception while starting Jetty
java.lang.RuntimeException: Exception while starting Jetty
at org.ops4j.pax.web.service.jetty.internal.JettyServerImpl.start([103:org.ops4j.pax.web.pax-web-jetty:3.2.5]
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)[:1.7.0_76]
at sun.reflect.NativeConstructorAccessorImpl.newInstance([:1.7.0_76]
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance([:1.7.0_76]
at java.lang.reflect.Constructor.newInstance([:1.7.0_76]
at org.eclipse.jetty.xml.XmlConfiguration$JettyXmlConfiguration.set([96:org.eclipse.jetty.aggregate.jetty-all-server:8.1.17.v20150415]
at org.eclipse.jetty.xml.XmlConfiguration$JettyXmlConfiguration.configure([96:org.eclipse.jetty.aggregate.jetty-all-server:8.1.17.v20150415]
Caused by: java.lang.NumberFormatException: For input string: “@{profile.configurations[’org.ops4j.pax.web'][‘org.osgi.service.http.port’] + 1}”
at java.lang.NumberFormatException.forInputString([:1.7.0_76]
at java.lang.Integer.parseInt([:1.7.0_76]
at java.lang.Integer.<init>([:1.7.0_76]
… 29 more

If you notice, the error message says that our template snippet cannot be converted to a Number.

Why our template snippet is displayed in first instance? The templating engine should have done its part of the job and give us back a static String without any reference to templating directives!

I have showed you this error on purpose, to insist on a concept I have described above but that might get uncaught on first instance.

MVEL support in Fabric, is implemented as an url handler.

So far we have just modified the content of a static resource file, but we haven’t given any hint to Fabric that we’d like to handle that file as a mvel template.

How to do that?

It’s just a matter of using the correct uri to refer to that same file.

So, modify the file default.profile/ that is the place in the default Fabric Profile where you define which static file contains Jetty configuration:

# change it from org.ops4j.pax.web.config.url=profile:jetty.xml to

Now, stop the instance again, remove the Fabric configuration files, recreate a Fabric and notice how you Jetty instance is running correctly.

We can check it in this way:

JBossFuse:karaf@root> config:list | grep org.osgi.service.http.port
   org.osgi.service.http.port = 8181

While from your browser you can verify, that Hawtio, JBoss Fuse web console that is deployed on top Jetty, is accessible to port 8191: http://localhost:8191/hawtio