Monday, October 19, 2015

Debugging tip: How to simulate a slow hardisk

As a Software Engineer there are times when you’d like to have a slower system.

It doesn’t happen really often actually: usually it’s when someone reports a bug on your software that you have never seen before and that you can not reproduce.

The majority of time, the reason of those ghost bugs are race conditions.

Race conditions, are issues you might face with multithreaded programming. Imagine that your software does multiple things at the same time. Despite most of the time those things happen in the expected and intuitive order, sometimes, they don’t; leading to unexpected state of your program.

They are indeed a bug. It’s dev’s fault. But the dev doesn’t have much way to protect himself from them. There are programming styles and technologies that push you to avoid the risk altogether, but I think that in general, they are a condition that each developer has to be familiar with.

So why a slower system helps?

Because most of the times, race conditions are masked by the fact the operations still happen “reasonably quickly”. This ambiguos “reasonably quickly” is the main issue. There is no clear limit or number that tells you how quickly. You just have higher chances to see them if things are slow enough to show they are not happening in the correct order or they are not waiting for the correct checkpoints.

In my experience with java applications, the main performance related aspect, while reproducing race condition is disk access speed. More thatn cpu speed or the amount of RAM, I have noticed that disk speed is the biggest differentiation between similar systems.

In this post I will show how to simulate a slow hardisk on Linux to increase your chances to reproduce race conditions.

The solution will be based on nbd and trickle and it will use the network layer to regulate the i/o throughput for your virtual hardisk.

I’d like to start adding that this isn’t anything new and that I’m not suggesting any particularly revolutionary approach. There are many blogpost out there that describe how to achieve this. But for multiple reasons, none of those that I have read worked out of the box on my Fedora 22 or Centos 6 installations.
That is the main reason that pushed me to give back to the internet, adding what might be, just another page on the argument.

Let’s start with the idea of using nbd or Network Block Device to simulate our hardisk.

As far as I understand there aren’t official ways, exposed by the linux Kernel to regulate the I/O speeds of generic block devices.

Over the internet you may find many suggestions, spanning from disabling read and write cache to geneate real load that could make your system busy.

QoS can be enforced on the network layer though. So the idea is to emulate a block device via network.

Despite this might sound very complicated (and maybe it is), the problem has been already been solved by the Linux Kernel with the nbd module.

Since on my Fedora 22 that module is not enabled automatically by default, we have to install it first, and then enable it:

# install nbd module
sudo yum install nbd

# load nbd module
sudo modprobe nbd

# check nbd module is really loaded
lsmod | grep nbd
nbd                    20480  0

Now that nbd is installed and the module loaded we create a configuration file for its daemon:

# run this command as root
"cat > /etc/nbd-server/config" <<EOF
[generic]
[test]
    exportname = /home/pantinor/test_nbd
    copyonwrite = false
EOF

Where exportname is a path to a file that will represent your slow virtual hardisk.

You can create the file with this command:

# create an empty file, and reserve it 1GB of space
dd if=/dev/zero of=/home/pantinor/test_nbd bs=1G count=1

Now that config and the destination files are in place, you can start the nbd-server using daemon:

# start ndb-server daemon
sudo systemctl start nbd-server.service

# monitor the daaemon start up with:
journalctl -f --unit nbd-server.service

At this point you have a server network process, listening on port 10809 that any client over your network can connect to , to mount it as a network block device.

We can mount it with this this command:

# "test" corresponds to the configuration section in daemon  config file
sudo nbd-client -N test  127.0.0.1 10809  /dev/nbd0
# my Centos 6 version of nbd-client needs a slightly different synatx:
#    sudo nbd-client -N test  127.0.0.1   /dev/nbd0

Now we have created a virtual block device, called /dev/nbd0. Now we can format it like it was a normal one:

# format device
sudo mkfs /dev/nbd0 

# create folder for mounting
sudo mkdir /mnt/nbd

# mount device, sync option is important to not allow the kernel to cheat!
sudo mount -o sync /dev/nbd0 /mnt/nbd

# add write permissions to everyone
sudo chmod a+rwx /mnt/nbd

Not that we have passed to mount command the flag -o sync. This command has an important function: to disable an enhancement in the linux Kernel that delays the completion of write operations to the devices. Without that all the write operations will look like instantaneous, and the kernel will actually complete the write requests in background. With this flag instead, all the operation will wait until the operation has really completed.

You can check that now you are able to read and write on the mount point /mnt/nbd.

Let’s now temporarily unmount and disconnect from nbd-server:

sudo umount /mnt/nbd

sudo nbd-client -d /dev/nbd0

And let’s introduce trickle.

Trickle is a software you can use to wrap other processes and to limit their networking bandwidth.

You can use it to limit any other program. A simple test you can perform with it is to use it with curl:

# download a sample file and limits download speed to 50 KB/s
trickle -d 50 -u 50  curl -O  http://download.thinkbroadband.com/5MB.zip

Now, as you can expect, we just need to join trickle and nbd-server behavior, to obtain the desired behavior.

Let’s start stopping current nbd-server daemon to free up its default port:

sudo systemctl stop nbd-server.service

And let’s start it via trickle:

# start nbd-server limiting its network throughput
trickle -d 20 -u 20 -v nbd-server -d

-d attaches the server process to the console, so the console will be blocked and it will be freed only one you close the process or when a client disconnects.
Ignore the error message: trickle: Could not reach trickled, working independently: No such file or directory

Now you can re-issue the commands to connect to nbd-server and re mount it:

sudo nbd-client -N test  127.0.0.1 10809  /dev/nbd0

sudo mount -o sync /dev/nbd0 /mnt/nbd

And you are done! Now you have a slow hardisk mounted on /dev/nbd0.

You can verify the slow behavior in this way:

sudo dd if=/dev/nbd0 of=/dev/null bs=65536 skip=100 count=10
10+0 records in
10+0 records out
655360 bytes (655 kB) copied, 18.8038 s, 34.9 kB/s

# when run against an nbd-server that doesn't use trickle the output is:
# 655360 bytes (655 kB) copied, 0.000723881 s, 905 MB/s

Now that you have a slow partition, you can just put the files of your sw there to simulate a slow i/o.

All the above steps can be converted to helper scripts that will make the process much simpler like those described here: http://philtortoise.blogspot.it/2013/09/simulating-slow-drive.html.

5 comments:

  1. If you run your app in a kvm virtual machine, it is much more easier:

    https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Virtualization_Tuning_and_Optimization_Guide/sect-Virtualization_Tuning_Optimization_Guide-BlockIO-Techniques.html

    ReplyDelete
    Replies
    1. Thanks for your suggestion.

      I was hoping to gather alternative approaches to the same problem so that people can chose the one that fit best their use case.

      Delete
  2. Very interesting and non-standard approach. Thank you for sharing this info! Also bandwidth limiting probably can be done via pv -L.

    P.S. There are some typos in the article like "ndb" instead of "nbd".

    ReplyDelete
  3. Thank you for your comment! I have fixed some typos.

    ReplyDelete
  4. Thank you so much for posting this guide. I'm finding that for read tests, I'm having to set up a script to clear the disk cache continuously or else the speed goes up to the real disk speed. Have you found any better ways to get around that?

    ReplyDelete