Thursday, June 29, 2017

OAuth2, JWT, Open-ID Connect and other confusing things

Disclaimer

If feel I have to start this post with an important disclaimer: don’t trust too much what I’m about to say.
The reason why I say this is because we are discussing security. And when you talk about security anything other then 100% correct statements risks to expose you to some risk of any sort.
So, please, read this article keeping in mind that your source of truth should be the official specifications, and that this is just an overview that I use to recap this topic in my own head and to introduce it to beginners.

Mission

I have decided to write this post because I have always found OAuth2 confusing. Even now that I know a little more about it, I found some of its part puzzling.
Even if I was able to follow online tutorials from the likes of Google or Pinterest when I need to fiddle with their APIs, it always felt like some sort of voodoo, with all those codes and Bearer tokens.
And each time they mentioned I could make my own decisions for specific steps, choosing among the standard OAuth2 approach, my mind tended to go blind.

I hope I’ll be able to fix some idea, so that from now on, you will be able to follow OAuth2 tutorials with more confidence.

What is OAuth2?

Let’s start from the definition:

OAuth 2 is an authorisation framework that enables applications to obtain limited access to user accounts on an HTTP service.

The above sentence is reasonably understandable , but we can improve things if we pinpoint the chose terms.

The Auth part of the name, reveals itself to be Authorisation(it could have been Authentication; it’s not).
Framework can be easily overlooked since the term framework is often abused; but the idea to keep here is that it’s not necessarily a final product or something entirely defined. It’s a toolset. A collection of ideas, approaches, well defined interactions that you can use to build something on top of it!
It enable applications to obtain limited access. The key here is that it enables applications not humans.
limited access to user accounts is probably the key part of the definition that can help you to remember and to explain what OAuth2 is:
the main aim is to allow a user to delegate access to a user owned resource. Delegating it to an application.

OAuth2 is about delegation.

It’s about a human, instructing a software to do something on her behalf.
The definition also mentions limited access, so you can imagine of being able to delegate just part of your capabilities.
And it concludes mentioning HTTP services. This authorisation-delegation, happens on an HTTP service.

Delegation before OAuth2

Now that the context should be clearer, we could ask ourselves: How were things done before OAuth2 and similar concepts came out?

Well, most of the time, it was as bad as you can guess: with a shared secret.

If I wanted a software A to be granted access to my stuff on server B, most of the time the approach was to give my user/pass to software A, so that it could use it on my behalf.
This is still a pattern you can see in many modern software, and I personally hope it’s something that makes you uncomfortable.
You know what they say: if you share a secret, it’s no longer a secret!

Now imagine if you could instead create a new admin/password couple for each service you need to share something with. Let’s call them ad-hoc passwords.
They are something different than your main account for a specific service but they still allow to access the same service as they were you. You would be able, in this case, to delegate, but you would still be responsible of keeping track of all this new application-only accounts you need to create.

OAuth2 - Idea

Keeping in mind that the business problem that we are trying to solve is the “delegation” one, we want to extend the ad-hoc password idea to take away from the user the burden of managing these ad-hoc passwords.
OAuth2 calls these ad-hoc passwords tokens.
Tokens, are actually more than that, and I’ll try to illustrate it, but it might be useful to associate them to this simpler idea of an ad-hoc password to begin with.

OAuth2 - Core Business

Oauth 2 Core Business is about:

  • how to obtain tokens

OAuth2 - What’s a token?

Since everything seems to focus around tokens, what’s a token?
We have already used the analogy of the ad-hoc password, that served us well so far, but maybe we can do better.
What if we look for the answer inside OAuth2 specs?
Well, prepare to be disappointed. OAuth2 specs do not give you the details of how to define a token. Why is this even possible?
Remember when we said that OAuth2 was “just a framework”? Well, this is one of those situation where that definition matters!
Specs just tell you the logical definition of what a token is and describe some of the capabilities it needs to posses.
But at the end, what specs say is that a token is a string. A string containing credentials to access a resource.
It gives some more detail, but it can be said that most of the time, it’s not really important what’s in a token. As long as the application is able to consume them.

A token is that thing, that allows an application to access the resource you are interested into.

To point out how you can avoid to overthink what a token is, specs also explicitly say that “is usually opaque to the client”!
They are practically telling you that you are not even required to understand them!
Less things to keep in mind, doesn’t sound bad!

But to avoid turning this into a pure philosophy lesson, let’s show what a token could be

{
   "access_token": "363tghjkiu6trfghjuytkyen",
   "token_type": "Bearer"
}

A quick glimpse show us that, yeah, it’s a string. JSON-like, but that’s probably just because json is popular recently, not necessarily a requirement.
We can spot a section with what looks like a random string, an id: 363tghjkiu6trfghjuytkyen. Programmers know that when you see something like this, at least when the string is not too long, it’s probably a sign that it’s just a key that you can correlate with more detailed information, stored somewhere else.
And that iss true also in this case.
More specifically, the additional information it will be the details about the specific authorisation that that code is representing.

But then another thing should capture your attention: "token_type": "Bearer".

Your reasonable questions should be: what are the characteristics of a Bearer token type? Are there other types? Which ones?

Luckily for our efforts to keep things simple, the answer is easy ( some may say, so easy to be confusing… )

Specs only talk about Bearer token type!

Uh, so why the person who designed a token this way, felt that he had to specify the only known value?
You might start seeing a pattern here: because OAuth2 is just a framework!
It suggests you how to do things, and it does some of the heavy lifting for you making some choice, but at the end, you are responsible of using the framework to build what you want.
We are just saying that, despite here we only talk about Bearer tokens, it doesn’t mean that you can’t define your custom type, with a meaning you are allowed to attribute to it.

Okay, just a single type. But that is a curious name. Does the name imply anything relevant?
Maybe this is a silly question, but for non-native English speakers like me, what Bearer means in this case could be slightly confusing.

Its meaning is quite simple actually:

A Bearer token is something that if you have a valid token, we trust you. No questions asked.

So simple it’s confusing. You might be arguing: “well, all the token-like objects in real world work that way: if I have valid money, you exchange them for the good you sell”.

Correct. That’s a valid example of a Bearer Token.

But not every token is of kind Bearer. A flight ticket, for example, it’s not a Bearer token.
It’s not enough having a ticket to be allowed to board on a plane. You also need to show a valid ID, so that your ticket can be matched with; and if your name matches with the ticket, and your face match with the id card, you are allowed to get on board.

To wrap this up, we are working with a kind of tokens, that if you posses one of them, that’s enough to get access to a resource.

And to keep you thinking: we said that OAuth2 is about delegation. Tokens with this characteristic are clearly handy if you want to pass them to someone to delegate.

A token analogy

Once again, this might be my non-native English speaker background that suggests me to clarify it.
When I look up for the first translation of token in Italian, my first language, I’m pointed to a physical object.
Something like this:

Token

That, specifically, is an old token, used to make phone calls in public telephone booths.
Despite being a Bearer token, its analogy with the OAuth2 tokens is quite poor.
A much better picture has been designed by Tim Bray, in this old post: An Hotel Key is an Access Token
I suggest you to read directly the article, but the main idea, is that compared to the physical metal coin that I have linked first, your software token is something that can have a lifespan, can be disabled remotely and can carry information.

Actors involved

These are our actors:

  • Resource Owner
  • Client (aka Application)
  • Authorisation Server
  • Protected Resource

It should be relatively intuitive: an Application wants to access a Protected Resource owned by a Resource Owner. To do so, it requires a token. Tokens are emitted by an Authorisation Server, which is a third party entity that all the other actors trust.

Usually, when a read something new, I tend to quickly skip through the actors of a system. Probably I shouldn’t, but most of the time, the paragraph that talks describe, for example, a “User”, ends up using many words to just tell me that it’s just, well, a user… So I try to look for the terms that are less intuitive and check if some of them has some own characteristic that I should pay particular attention to.

In OAuth2 specific case, I feel that the actor with the most confusing name is Client.
Why do I say so? Because, in normal life (and in IT), it can mean many different things: a user, a specialised software, a very generic software…

I prefer to classify it in my mind as Application.

Stressing out that the Client is the Application we want to delegate our permissions to. So, if the Application is, for example, a server side web application we access via a browser, the Client is not the user or the browser itself: the client is the web application running in its own environment.

I think this is very important. Client term is all over the place, so my suggestion is not to replace it entirely, but to force your brain to keep in mind the relationship Client = Application.

I also like to think that there is another not official Actor: the User-Agent.

I hope I won’t confuse people here, because this is entirely something that I use to build my mental map.
Despite not being defined in the specs, and also not being present in all the different flows, it can help to identify this fifth Actor in OAuth2 flows.
The User-Agent is most of the time impersonated by the Web Browser. Its responsibility is to enable an indirect propagation of information between 2 systems that are not talking directly each other.
The idea is: A should talk to B, but it’s not allowed to do so. So A tells C (the User-Agent) to tell B something.

It might be still a little confusing at the moment, but I hope I’ll be able to clarify this later.

OAuth2 Core Business 2

OAuth2 is about how to obtain tokens.

Even if you are not an expert on OAuth2, as soon as someone mentions the topic, you might immediately think about those pages from Google or the other major service providers, that pop out when you try to login to a new service on which you don’t have an account yet, and tell Google, that yeah, you trust that service, and that you want to delegate some of your permissions you have on Google to that service.

This is correct, but this is just one of the multiple possibly interactions that OAuth2 defines.

There are 4 main ones it’s important you know. And this might come as a surprise if it’s the first time you hear it:
not all of them will end up showing you the Google-like permissions screen!
That’s because you might want to leverage OAuth2 approach even from a command line tool; maybe even without any UI at all, capable of displaying you an interactive web page to delegate permissions.

Remember once again: the main goal is to obtain tokens!

If you find a way to obtain one, the “how” part, and you are able to use them, you are done.

As we were saying, there are 4 ways defined by the OAuth2 framework. Some times they are called flows, sometimes they are called grants.
It doesn’t really matter how you call them. I personally use flow since it helps me reminding that they differ one from the other for the interactions you have to perform with the different actors to obtain tokens.

They are:

  • Authorisation Code Flow
  • Implicit Grant Flow
  • Client Credential Grant Flow
  • Resource Owner Credentials Grant Flow (aka Password Flow)

Each one of them, is the suggested flow for specific scenarios.
To give you an intuitive example, there are situation where your Client is able to keep a secret(a server side web application) and other where it technically can’t (a client side web application you can entirely inspect it’s code with a browser).
Environmental constraints like the one just described would make insecure ( and useless ) some of the steps defined in the full flow. So, to keep it simpler, other flows have been defined when some of the interactions that were impossible or that were not adding any security related value, have been entirely skipped.

OAuth2 Poster Boy: Authorisation Code Flow

We will start our discussion with Authorisation Code Flow for three reasons:

  • it’s the most famous flow, and the one that you might have already interacted with (it’s the Google-like delegation screen one)
  • it’s the most complex, articulated and inherently secure
  • the other flows are easier to reason about, when compared to this one

The Authorisation Code Flow, is the one you should use if your Client is trusted and is able to keep a secret. This means a server side web application.

How to get a token with Authorisation Code Flow

  1. All the involved Actors trust the Authorisation Server
  2. User(Resource Owner) tells a Client(Application) to do something on his behalf
  3. Client redirects the User to an Authorisation Server, adding some parameters: redirect_uri, response_type=code, scope, client_id
  4. Authorisation Server asks the User if he wishes to grant Client access some resource on his behalf(delegation) with specific permissions(scope).
  5. User accepts the delegation request, so the Auth Server sends now an instruction to the User-Agent(Browser), to redirect to the url of the Client. It also injects a code=xxxxx into this HTTP Redirect instruction.
  6. Client, that has been activated by the User-Agent thanks to the HTTP Redirect, now talks directly to the Authorisation Server (bypassing the User-Agent). client_id, client_secret and code(that it had been forwarded).
  7. Authorisation Server returns the Client (not the browser) a valid access_token and a refresh_token

This is so articulated that it’s also called the OAuth2 dance!

Let’s underline a couple of points:

  • At step 2, we specify, among the other params, a redirect_uri. This is used to implement that indirect communication we anticipated when we have introduced the User-Agent as one of the actors. It’s a key information if we want to allow the Authorisation Server to forward information to the Client without a direct network connection open between the two.
  • the scope mentioned at step 2 is the set of permissions the Client is asking for
  • Remember that this is the flow you use when the client is entirely secured. It’s relevant in this flow at step 5, when the communication between the Client and the Authorisation Server, avoids to pass through the less secure User-Agent (that could sniff or tamper the communication). This is also why, it makes sense that for the Client to enable even more security, that is to send its client_secret, that is shared only between him and the Authorisation Server.
  • The refresh_token is used for subsequent automated calls the Client might need to perform to the Authorisation Server. When the current access_token expires and it needs to get a new one, sending a valid refresh_token allows to avoid asking the User again to confirm the delegation.

OAuth2 Got a token, now what?

OAuth2 is a framework remember. What does the framework tells me to do now?

Well, nothing. =P

It’s up to the Client developer.

She could (and often should):

  • check if token is still valid
  • look up for detailed information about who authorised this token
  • look up what are the permissions associated to that token
  • any other operation that it makes sense to finally give access to a resource

They are all valid, and pretty obvious points, right?
Does the developer have to figure out on her own the best set of operations to perform next?
She definitely can. Otherwise she can leverage another specification: OpenIDConnect(OIDC). More on this later.

OAuth2 - Implicit Grant Flow

It’s the flow designed for Client application that can’t keep a secret. An obvious example are client side HTML applications. But even any binary application whose code is exposed to the public can be manipulated to extract their secrets.
Couldn’t we have re-used the Authorisation Code Flow?
Yes, but… What’s the point of step 5) if secret is not a secure secret anymore? We don’t get any protection from that additional step!
So, Implicit Grant Flow, is just similar to Authorisation Code Flow, but it doesn’t perform that useless step 5.
It aims to obtain directly access_tokens without the intermediate step of obtaining a code first, that will be exchanged together with a secret, to obtain an access_token.

It uses response_type=token to specific which flow to use while contacting the Authorisation Server.
And also that there is no refresh_token. And this is because it’s assumed that user sessions will be short (due to the less secure environment) and that anyhow, the user will still be around to re-confirm his will to delegate(this was the main use case that lead to the definition of refresh_tokens).

OAuth2 - Client Credential Grant Flow

What if we don’t have a Resource Owner or if he’s indistinct from the Client software itself (1:1 relationship) ?
Imagine a backend system that just wants to talk to another backend system. No Users involved.
The main characteristic of such an interaction is that it’s no longer interactive, since we no longer have any user that is asked to confirm his will to delegate something.
It’s also implicitly defining a more secure environment, where you don’t have to be worried about active users risking to read secrets.

Its type is response_type=client_credentials.

We are not detailing it here, just be aware that it exist, and that just like the previous flow, it’s a variation, a simplification actually, of the full OAuth dance, that you are suggested to use if your scenario allows that.

OAuth2 - Resource Owner Credentials Grant Flow (aka Password Flow)

Please raise your attention here, because you are about to be confused.

This is the scenario:
The Resource Owner, has an account on the Authorisation Server. The Resource Owner gives his account details to the Client. The Client use this details to authenticate to the Authorisation Server…

=O

If you have followed through the discussion you might be asking if I’m kidding you.
This is exactly the anti-pattern we tried to move away from at the beginning of our OAuth2 exploration!

How is it possible to find it listed here as possible suggested flow?

The answer is quite reasonable actually: It’s a possible first stop for migration from a legacy system.
And it’s actually a little better than the shared password antipattern:
The password is shared but that is just a mean to start the OAuth Dance used to obtain tokens.

This allows OAuth2 to put its foot into the door, if we don’t have better alternatives.
It introduces the concept of access_tokens, and it can be used until the architecture will be mature enough (or the environment will change) to allow a better and more secure Flow to obtain tokens.
Also, please notice that now tokens are the ad-hoc password that reaches the Protected Resource system, while in the fully shared password antipattern, it was our password that needs to be forwarded.

So, far from ideal, but at least we justified by some criteria.

How to chose the best flow?

There are many decision flow diagrams on the internet. One of those that I like the most is this one:

OAuth2 Flows from https://auth0.com

It should help you to remember the brief description I have gave you here and to chose the easiest flow based on your environment.

OAuth2 Back to tokens - JWT

So, we are able to get tokens now. We have multiple ways to get them. We have not been told explicitly what to do with them, but with some extra effort and a bunch of additional calls to the Authorisation Server we can arrange something and obtain useful information.

Could things be better?

For example, we have assumed so fare that our tokens might look like this:

{
   "access_token": "363tghjkiu6trfghjuytkyen",
   "token_type": "Bearer"
}

Could we have more information in it, so to save us some round-trip to the Authorisation Server?

Something like the following would be better:

{
  "active": true,
  "scope": "scope1 scope2 scope3",
  "client_id": "my-client-1",
  "username": "paolo",
  "iss": "http://keycloak:8080/",
  "exp": 1440538996,
  "roles" : ["admin", "people_manager"],
  "favourite_color": "maroon",
  ... : ...
}

We’d be able to access directly some information tied to the Resource Owner delegation.

Luckily someone else had the same idea, and they came out with JWT - JSON Web Tokens.
JWT is a standard to define the structure of JSON based tokens representing a set of claims. Exactly what we were looking for!

Actually the most important aspect that JWT spec gives us is not in the payload that we have exemplified above, but in the capability to trust the whole token without involving an Authorizatin Server!

How is that even possible? The idea is not a new one: asymmetric signing (pubkey), defined, in the context of JWT by JOSE specs.

Let me refresh this for you:

In asymmetric signing two keys are used to verify the validity of information.
These two keys are coupled, but one is secret, known only to the document creator, while the other is public.
The secret one is used to calculate a fingerprint of the document; an hash.
When the document is sent to destination, the reader uses the public key, associated with the secret one, to verify if the document and the fingerprint he has received are valid.
Digital signing algorithms tell us that the document is valid, according to the public key, only if it’s been signed by the corresponding secret key.

The overall idea is: if our local verification passes, we can be sure that the message has been published by the owner of the secret key, so it’s implicitly trusted.

And back to our tokens use case:

We receive a token. Can we trust this token? We verify the token locally, without the need to contact the issuer. If and only if, the verification based on the trusted public key passes, we confirm that token is valid. No question asked. If the token is valid according to digital signage AND if it’s alive according to its declared lifespan, we can take those information as true and we don’t need to ask for confirmation to the Authorisation Server!

As you can imagine, since we put all this trust in the token, it might be savvy not to emit token with an excessively long lifespan:
someone might have changed his delegation preferences on the Authorisation Server, and that information might not have reached the Client, that still has a valid and signed token it can based its decision onto.
Better to keep things a little more in sync, emitting tokens with a shorter life span, so, eventual outdated preferences don’t risk to be trusted for long periods.

OpenID Connect

I hope this section won’t disappoint you, but the article was already long and dense with information, so I’ll keep it short on purpose.

OAuth2 + JWT + JOSE ~= OpenID Connect

Once again: OAuth2 is a framework.
OAuth2 framework is used in conjunction with JWT specs, JOSE and other ideas we are not going to detail here, the create OpenID Connect specification.

The idea you should bring back is that, more often you are probably interested into using and leveraging OpenID Connect, since it puts together the best of the approaches and idea defined here.
You are, yes, leveraging OAuth2, but you are now the much more defined bounds of OpenID Connect, that gives you richer tokens and support for Authentication, that was never covered by plain OAuth2.

Some of the online services offer you to chose between OAuth2 or OpenID Connect. Why is that?
Well, when they mention OpenID Connect, you know that you are using a standard. Something that will behave the same way, even if you switch implementation.
The OAuth2 option you are given, is probably something very similar, potentially with some killer feature that you might be interested into, but custom built on top of the more generic OAuth2 framework.
So be cautious with your choice.

Conclusion

If you are interested into this topic, or if this article has only confused you more, I suggest you to check OAuth 2 in Action by Justin Richer and Antonio Sanso.
On the other side, if you want to check your fresh knowledge and you want to try to apply it to an open source Authorisation Server, I will definitely recommend playing with Keycloak that is capable of everything that we have described here and much more!

Tuesday, May 24, 2016

JBoss Fuse: dynamic Blueprint files with JEXL

In this post I’ll show how to add a little bit of inline scripting in your Apache Aries Blueprint xml files.

I wouldn’t call it necessarely a best practice, but I have always had the idea that this capability might be usueful; probably I started wanting this when I was forced to use xml to simulate imperative programming structures like when using Apache Ant.

And I have found the idea validated in projects like Gradle or Vagrant where a full programming language is actually hiding in disguise, pretending to be a Domain Specific Languge or a surprisingly flexible configuration syntax.

I have talked in past about something similar, when showing how to use MVEL in JBoss Fuse.
This time I will limit myself to show how to use small snippets of code that can be inlined in your otherwise static xml files, trick that might turn useful in case you need to perform simple operations like replacement of strings, aritmetics or anything else but you want to avoid writing a java class for that.

Let me say that I’m not inventing anything new around here. I’m just showing how to use a functionality that has been provided directly by the Apache Aries project but that I haven’t used that often out there.

The goal is to allow you to write snippet like this:

...
  
       

...

You can see that we are invoking java.lang.String.replaceAll() method on the value of an environment variable.

We can do this thanks to the Apache Aries Bluerpint JEXL Evaluator, an extension to Apache Aries Blueprint, that implements a custom token processor that “extends” the base functionality of Aries Blueprint.

In this specific case, it does it, delegating the token interpolation to the project Apache JEXL.

JEXL, Java Expression Language, it’s just a library that exposes scripting capabilities to the java platorm. It’s not unique in what it does, since you could achieve the same with the native support for Javascript or with Groovy for instance. But we are going to use it since the integration with Blueprint has alredy been written, so we can use it straight away on our Apache Karaf or JBoss Fuse instance.

The following instructions have been verified on JBoss Fuse 6.2.1:

# install JEXL bundle
install -s mvn:org.apache.commons/commons-jexl/2.1.1 
# install JEXL Blueprint integration:
install -s mvn:org.apache.aries.blueprint/org.apache.aries.blueprint.jexl.evaluator/1.0.0

That was all the preparation that we needed, now we just need to use the correct XSD version, 1.2.0 in our Bluerpint file:

xmlns:ext="http://aries.apache.org/blueprint/xmlns/blueprint-ext/v1.2.0"

Done that, we can leverage the functionality in this way:



    

    
         
    
    
    
      
              
      
    

Copy that blueprint.xml directly into deploy/ folder, and you can check from Karaf shell that the dynamic invocation of those inline script has actually happened!

JBossFuse:karaf@root> ls (id blueprint.xml) | grep osgi.jndi.service.name
osgi.jndi.service.name = /OPT/RH/JBOSS-FUSE-6.2.1.REDHAT-107___3

This might turn useful in specific scenarios, when you look for a quick way to create dynamic configuration.

In case you might be interested into implementing your custom evaluator, this is the interface you need to provide an implementation of:

https://github.com/apache/aries/blob/trunk/blueprint/blueprint-core/src/main/java/org/apache/aries/blueprint/ext/evaluator/PropertyEvaluator.java

And this is an example of the service you need to expose to be able to refer it in your <property-placeholder> node:


    
        
    
    
    

Tuesday, April 26, 2016

Deploy and configure a local Docker caching proxy

Recently I was looking into caching for Docker layers downloading for the Fabric8 development environment, to allow me to trash the vms where my Docker daemon was running and still avoiding me to re-download basic images each single time I recreate my vm.

As usual, I tried to hit Google first, and was pointed to these couple of pages:

http://unpoucode.blogspot.it/2014/11/use-proxy-for-speeding-up-docker-images.html

and

https://blog.docker.com/2015/10/registry-proxy-cache-docker-open-source/

Since I use quite often Jerome Petazzoni’s approach for a transparent Squid + iptables to cache and sniff simple http traffic (usually http invocation from java programs), I’ve found that the first solution based made sense, so I have tried that first.

It turned out that the second link was what I was looking for; but I have still spent some good learning hours with the no longer working suggestions from the first one, learning the hard way that Squid doesn’t play that nice with Amazon’s Cloudfront CDN, used by Docker Hub. But I have to admit that it’s been fun.
Now I know how to forward calls to Squid to hit an intermediate interceptors that mangles with query params, headers and everything else.
I couldn’t find a working combination for Cloudfront but I am now probably able to reproduce the infamous Cats Proxy Prank. =)

Anyhow, as I was saying, what I was really looking for is that second link that shows you how to setup an intermediate Docker proxy, that your Docker daemon will try to hit, before defaulting to the usual Docker Hub public servers.

Almost everything that I needed was in that page, but I have found the information to be a little more cryptic that needed.

The main reason for that is because that example assumes I need security (TLS), which is not really my case since the proxy is completely local.

Additionally, it shows how to configure you Docker Registry using YAML configuration. Again, not really complex, but indeed more than needed.

Yes, because what you really need to bring up the simplest local (not secured) Docker proxy is this oneliner:

docker run -p 5000:5000 -d --restart=always --name registry   \
  -e REGISTRY_PROXY_REMOTEURL=http://registry-1.docker.io \
  registry:2

The interesting part here is that registry image, supports a smart alternative way to forward configuration to it, that saves you from passing it a YAML confguration file.

The idea, described here, is that if you follow a naming convention for the environment variables, that reflects the hierarchy of the YAML tree, you can turn something like:

...
proxy:
  remoteurl: http://registry-1.docker.io
...

That you should write in a .yaml file and pass to the process in this way:

docker run -d -p 5000:5000 --restart=always --name registry \
  -v `pwd`/config.yml:/etc/docker/registry/config.yml \
  registry:2

Into the much more conventient -e REGISTRY_PROXY_REMOTEURL=http://registry-1.docker.io runtime environment variable!

Let’s improve the example a little, so that we also pass our Docker proxy a non-volatile storage location for the cached layers, so that we are not losing them between invocations:

docker run -p 5000:5000 -d --restart=always --name registry   \
  -e REGISTRY_PROXY_REMOTEURL=http://registry-1.docker.io \
  -v /opt/shared/docker_registry_cache:/var/lib/registry \
  registry:2

Now we have everything we needed to save a good share of bandwidth, each time we need to get some Docker image that had already passed through our local proxy.

The only remaining bit is to tell our Docker daemon to be aware of the new proxy:

# update your docker daemon config, according to your distro
# content of my `/etc/sysconfig/docker` in Fedora 23
OPTIONS=" --registry-mirror=http://localhost:5000"

Reload (or restart) your Docker daemon and you are done! Just be aware that if you restart the daemon you might need also to re-start the Registry container, if you ware working on a single node.

An interesting discovery, it’s been learning that Docker daemon doesn’t break if it cannot find the specified registry-mirror. So you can add the configuration and forget about it, knowing that your interaction with Docker Hub will just benefit of possible hits your caching proxy, assuming it’s running.

You can see it working with the following tests:

docker logs -f registry

will log all the outgoing download requests, and once the set of requrests that compose a single pull image operation will be completed, you will also be able to check that the image is now completely served by your proxy with this invocation:

curl http://localhost:5000/v2/_catalog
# sample output
{"repositories":["drifting/true"]}

The article would be finished, but since I feel bad to show how to disable security on the internet, here’s also a very short and fully working and tested example of how to implement the same with TLS enabled:

# generate a self signed certificate; accept default for every value a part from Common Name where you have to put your box hostname
mkdir -p certs && openssl req  -newkey rsa:4096 -nodes -sha256 -keyout certs/domain.key  -x509 -days 365 -out certs/domain.crt

# copy to the locally trusted ones, steps for Fedora/Centos/RHEL
sudo cp certs/domain.crt /etc/pki/ca-trust/source/anchors/

# load the newly added certificate
sudo update-ca-trust enable

# run the registy using those keys that you have generated, mounting the files inside the container
docker run -p 5000:5000 --restart=always --name registry \
  -v `pwd`/certs:/certs \
  -e REGISTRY_HTTP_TLS_CERTIFICATE=/certs/domain.crt \
  -e REGISTRY_HTTP_TLS_KEY=/certs/domain.key \
  -e REGISTRY_PROXY_REMOTEURL=https://registry-1.docker.io \
  registry:2

# now you just need to remember that you are working in https, so you need to use that protocol in your docker daemon configuration, instead of plain http; also use that when you interact with the API in curl

Monday, October 19, 2015

Debugging tip: How to simulate a slow hardisk

As a Software Engineer there are times when you’d like to have a slower system.

It doesn’t happen really often actually: usually it’s when someone reports a bug on your software that you have never seen before and that you can not reproduce.

The majority of time, the reason of those ghost bugs are race conditions.

Race conditions, are issues you might face with multithreaded programming. Imagine that your software does multiple things at the same time. Despite most of the time those things happen in the expected and intuitive order, sometimes, they don’t; leading to unexpected state of your program.

They are indeed a bug. It’s dev’s fault. But the dev doesn’t have much way to protect himself from them. There are programming styles and technologies that push you to avoid the risk altogether, but I think that in general, they are a condition that each developer has to be familiar with.

So why a slower system helps?

Because most of the times, race conditions are masked by the fact the operations still happen “reasonably quickly”. This ambiguos “reasonably quickly” is the main issue. There is no clear limit or number that tells you how quickly. You just have higher chances to see them if things are slow enough to show they are not happening in the correct order or they are not waiting for the correct checkpoints.

In my experience with java applications, the main performance related aspect, while reproducing race condition is disk access speed. More thatn cpu speed or the amount of RAM, I have noticed that disk speed is the biggest differentiation between similar systems.

In this post I will show how to simulate a slow hardisk on Linux to increase your chances to reproduce race conditions.

The solution will be based on nbd and trickle and it will use the network layer to regulate the i/o throughput for your virtual hardisk.

I’d like to start adding that this isn’t anything new and that I’m not suggesting any particularly revolutionary approach. There are many blogpost out there that describe how to achieve this. But for multiple reasons, none of those that I have read worked out of the box on my Fedora 22 or Centos 6 installations.
That is the main reason that pushed me to give back to the internet, adding what might be, just another page on the argument.

Let’s start with the idea of using nbd or Network Block Device to simulate our hardisk.

As far as I understand there aren’t official ways, exposed by the linux Kernel to regulate the I/O speeds of generic block devices.

Over the internet you may find many suggestions, spanning from disabling read and write cache to geneate real load that could make your system busy.

QoS can be enforced on the network layer though. So the idea is to emulate a block device via network.

Despite this might sound very complicated (and maybe it is), the problem has been already been solved by the Linux Kernel with the nbd module.

Since on my Fedora 22 that module is not enabled automatically by default, we have to install it first, and then enable it:

# install nbd module
sudo yum install nbd

# load nbd module
sudo modprobe nbd

# check nbd module is really loaded
lsmod | grep nbd
nbd                    20480  0

Now that nbd is installed and the module loaded we create a configuration file for its daemon:

# run this command as root
"cat > /etc/nbd-server/config" <<EOF
[generic]
[test]
    exportname = /home/pantinor/test_nbd
    copyonwrite = false
EOF

Where exportname is a path to a file that will represent your slow virtual hardisk.

You can create the file with this command:

# create an empty file, and reserve it 1GB of space
dd if=/dev/zero of=/home/pantinor/test_nbd bs=1G count=1

Now that config and the destination files are in place, you can start the nbd-server using daemon:

# start ndb-server daemon
sudo systemctl start nbd-server.service

# monitor the daaemon start up with:
journalctl -f --unit nbd-server.service

At this point you have a server network process, listening on port 10809 that any client over your network can connect to , to mount it as a network block device.

We can mount it with this this command:

# "test" corresponds to the configuration section in daemon  config file
sudo nbd-client -N test  127.0.0.1 10809  /dev/nbd0
# my Centos 6 version of nbd-client needs a slightly different synatx:
#    sudo nbd-client -N test  127.0.0.1   /dev/nbd0

Now we have created a virtual block device, called /dev/nbd0. Now we can format it like it was a normal one:

# format device
sudo mkfs /dev/nbd0 

# create folder for mounting
sudo mkdir /mnt/nbd

# mount device, sync option is important to not allow the kernel to cheat!
sudo mount -o sync /dev/nbd0 /mnt/nbd

# add write permissions to everyone
sudo chmod a+rwx /mnt/nbd

Not that we have passed to mount command the flag -o sync. This command has an important function: to disable an enhancement in the linux Kernel that delays the completion of write operations to the devices. Without that all the write operations will look like instantaneous, and the kernel will actually complete the write requests in background. With this flag instead, all the operation will wait until the operation has really completed.

You can check that now you are able to read and write on the mount point /mnt/nbd.

Let’s now temporarily unmount and disconnect from nbd-server:

sudo umount /mnt/nbd

sudo nbd-client -d /dev/nbd0

And let’s introduce trickle.

Trickle is a software you can use to wrap other processes and to limit their networking bandwidth.

You can use it to limit any other program. A simple test you can perform with it is to use it with curl:

# download a sample file and limits download speed to 50 KB/s
trickle -d 50 -u 50  curl -O  http://download.thinkbroadband.com/5MB.zip

Now, as you can expect, we just need to join trickle and nbd-server behavior, to obtain the desired behavior.

Let’s start stopping current nbd-server daemon to free up its default port:

sudo systemctl stop nbd-server.service

And let’s start it via trickle:

# start nbd-server limiting its network throughput
trickle -d 20 -u 20 -v nbd-server -d

-d attaches the server process to the console, so the console will be blocked and it will be freed only one you close the process or when a client disconnects.
Ignore the error message: trickle: Could not reach trickled, working independently: No such file or directory

Now you can re-issue the commands to connect to nbd-server and re mount it:

sudo nbd-client -N test  127.0.0.1 10809  /dev/nbd0

sudo mount -o sync /dev/nbd0 /mnt/nbd

And you are done! Now you have a slow hardisk mounted on /dev/nbd0.

You can verify the slow behavior in this way:

sudo dd if=/dev/nbd0 of=/dev/null bs=65536 skip=100 count=10
10+0 records in
10+0 records out
655360 bytes (655 kB) copied, 18.8038 s, 34.9 kB/s

# when run against an nbd-server that doesn't use trickle the output is:
# 655360 bytes (655 kB) copied, 0.000723881 s, 905 MB/s

Now that you have a slow partition, you can just put the files of your sw there to simulate a slow i/o.

All the above steps can be converted to helper scripts that will make the process much simpler like those described here: http://philtortoise.blogspot.it/2013/09/simulating-slow-drive.html.

Monday, October 5, 2015

JBoss Fuse - Turn your static config into dynamic templates with MVEL

Recently I have rediscovered a JBoss Fuse functionality that I had forgotten about and I’ve thought that other people out there may benefit of this reminder.

This post will be focused on JBoss Fuse and Fabric8 but it might interest also all those developers that are looking for minimally invasive ways to add some degree of dynamic support to their static configuration files.

The idea of dynamic configuration in OSGi and in Fabric8

OSGi framework is more often remembered for its class-loading behavior. But a part of that, it also defines other concepts and functionality that the framework has to implement.
One of the is ConfigAdmin.

ConfigAdmin is a service to define an externalized set of properties files that are logically bounded to your deployment units.

The lifecycle of this external properties files is linked with OSGi bundle lifecycle: if you modify an external property file, your bundle will be notified.
Depending on how you coded your bundle you can decide to react to the notification and, programmatically or via different helper frameworks like Blueprint, you can invoke code that uses the new configuration.

This mechanism is handy and powerful, and all developers using OSGi are familiar with it.

Fabric8 builds on the idea of ConfigAdmin, and extends it.

With its provisioning capabilities, Fabric8 defines the concept of a Profile that encapsulates deployment units and configuration. It adds some layer of functionality on top of plain OSGi and it allows to manage any kind of deployment unit, not only OSGi bundles, as well as any kind of configuration or static file.

If you check the official documentation you will find the list of “extensions” that Fabric8 layer offers and you will learn that they are divided mainly in 2 groups: Url Handlers and Property Resolvers.

I suggest everyone that is interested in this technology to dig through the documentation; but to offer a brief summary and a short example, imagine that your Fabric profiles have the capability to resolve some values at runtime using specific placeholders.

ex.

# sample url handler usage, ResourceName is a filename relative to the namespace of the containing Profile:
profile:ResourceName

# sample property handler, the value is read at deploy time, from the Apache Zookeeper distributed registry that is published when you run JBoss Fuse
${zk:/fabric/registry/containers/config/ContainerName/Property}

There are multiple handlers available out of the box, covering what the developers thought were the most common use cases: Zookeeper, Profiles, Blueprint, Spring, System Properties, Managed Ports, etc.

And you might also think to extend the mechanism defining your own extension: for example you might want to react to performance metrics you are storing on some system, you can write an extension, with it’s syntax convention, that injects values taken from your system.

The limit of all this power: static configuration files

The capabilities I have introduced above are exciting and powerful but they have an implicit limit: they are available only to .properties files or to files that Fabric is aware of.

This means that those functionality are available if you have to manage Fabric Profiles, OSGi properties or other specific technology that interact with them like Camel, but they are not enabled for anything that is Fabric-Unaware.

Imagine you have your custom code that reads an .xml configuration file. And imagine that your code doesn’t reference any Fabric object or service.
Your code will process that .xml file as-is. There won’t be any magic replacement of tokens or paths, because despite you are running inside Fabric, you are NOT using any directly supported technology and you are NOT notifying Fabric, that you might want its services.

To solve this problem you have 3 options:

  1. You write an extension to Fabric to handle and recognise your static resources and delegates the dynamic replacement to the framework code.
  2. You alter the code contained in your deployment unit, and instead of consuming directly the static resources you ask to Fabric services to interpolate them for you
  3. *You use mvel: url handler (and avoid touching any other code!)

What is MVEL ?

MVEL is actually a programming language: https://en.wikipedia.org/wiki/MVEL .
In particular it’s also scripting language that you can run directly from source skipping the compilation step.
It actually has multiple specific characteristics that might make it interesting to be embedded within another application and be used to define new behaviors at runtime. For all these reasons, for example, it’s also one of the supported languages for JBoss Drools project, that works with Business Rules you might want to define or modify at runtime.

Why it can be useful to us? Mainly for 2 reasons:

  1. it works well as a templating language
  2. Fabric8 has already a mvel: url handler that implicitly, acts also as a resource handler!

Templating language

Templating languages are those family of languages (often Domain Specific Languages) where you can altern static portion of text that is read as-is and dynamic instructions that will be processed at parsing time. I’m probably saying in a more complicated way the same idea I have already introduced above: you can have tokens in your text that will be translated following a specific convention.

This sounds exactly like the capabilities provided by the handlers we have introduced above. With an important difference: while those were context specific handler, MVEL is a general purpose technology. So don’t expect it to know anything about Zookeeper or Fabric profiles, but expect it to be able to support generic programming language concepts like loops, code invocation, reflection and so on.

Fabric supports it!

A reference to the support in Fabric can be find here: http://fabric8.io/gitbook/urlHandlers.html

But let me add a snippet of the original code that implements the functionality, since this is the part where you might found this approach interesting even outside the context of JBoss Fuse:

https://github.com/fabric8io/fabric8/blob/1.x/fabric/fabric-core/src/main/java/io/fabric8/service/MvelUrlHandler.java#L115-L126

public InputStream getInputStream() throws IOException {
  assertValid();
  String path = url.getPath();
  URL url = new URL(path);
  CompiledTemplate compiledTemplate = TemplateCompiler.compileTemplate(url.openStream());
  Map<String, Object> data = new HashMap<String, Object>();
  Profile overlayProfile = fabricService.get().getCurrentContainer().getOverlayProfile();
  data.put(“profile”, Profiles.getEffectiveProfile(fabricService.get(), overlayProfile));
  data.put(“runtime”, runtimeProperties.get());
  String content = TemplateRuntime.execute(compiledTemplate, data).toString();
  return new ByteArrayInputStream(content.getBytes());
}

What’s happening here?

First, since it’s not showed in the snippet, remember that this is a url handler. This means that the behavior get’s triggered for files that are referred to via a specific uri. In this case it’s mvel:. For example a valid path might be mvel:jetty.xml.

The other interesting and relatively simple thing to notice is the interaction with MVEL interpreter.
Like most of the templating technologies, even the simplest ones you can implement yourself you usually have:

  • an engine/complier, here it’s TemplateCompiler
  • a variable that contains your template, here it’s url
  • a variable that represent your context, that is the set of variables you want to expose to the engine, here data

Put them all together, asking the engine to do it’s job, here with TemplateRuntime.execute(...) and what you get in output is a static String. No longer the templating instructions, but all the logic your template was defining has been applied, and eventually, augmented with some of the additional input values taken from the context.

An example

I hope my explanation it’s been simple enough, but probably an example is the best way to express the concept.

Let’s use jetty.xml, contained in JBoss Fuse default.profile, that is a static resource that JBoss Fuse doesn’t handle as any special file, so it doesn’t offer any replacement functionality to it.

I will show both aspects of MVEL integration here: reading some value from the context variables and show how programmatic logic (just the sum of 2 integers here) can be used:

<Property name="jetty.port" default="@{  Integer.valueOf( profile.configurations['org.ops4j.pax.web']['org.osgi.service.http.port'] ) + 10  }"/>

We are modifying the default value for Jetty port, taking its initial value from the “profile” context variable, that is a Fabric aware object that has access to the rest of the configuration:

profile.configurations['org.ops4j.pax.web']['org.osgi.service.http.port']

we explicitly cast it from String to Integer:

Integer.valueOf( ... )

and we add a static value of 10 to the returned value:

.. + 10

Let’s save the file, stop our fuse instance. Restart it and re-create a test Fabric:

# in Fuse CLI shell
shutdown -f

# in bash shell
rm -rf data instances

bin/fuse

# in Fuse CLI shell
fabric:create --wait-for-provisioning

Just wait and monitor logs and… Uh-oh. An error! What’s happening?

This is the error:

2015-10-05 12:00:10,005 | ERROR | pool-7-thread-1  | Activator                        | 102 - org.ops4j.pax.web.pax-web-runtime - 3.2.5 | Unable to start pax web server: Exception while starting Jetty
java.lang.RuntimeException: Exception while starting Jetty
at org.ops4j.pax.web.service.jetty.internal.JettyServerImpl.start(JettyServerImpl.java:143)[103:org.ops4j.pax.web.pax-web-jetty:3.2.5]
…
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)[:1.7.0_76]
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)[:1.7.0_76]
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)[:1.7.0_76]
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)[:1.7.0_76]
at org.eclipse.jetty.xml.XmlConfiguration$JettyXmlConfiguration.set(XmlConfiguration.java:572)[96:org.eclipse.jetty.aggregate.jetty-all-server:8.1.17.v20150415]
at org.eclipse.jetty.xml.XmlConfiguration$JettyXmlConfiguration.configure(XmlConfiguration.java:396)[96:org.eclipse.jetty.aggregate.jetty-all-server:8.1.17.v20150415]
…
Caused by: java.lang.NumberFormatException: For input string: “@{profile.configurations[’org.ops4j.pax.web'][‘org.osgi.service.http.port’] + 1}”
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)[:1.7.0_76]
at java.lang.Integer.parseInt(Integer.java:492)[:1.7.0_76]
at java.lang.Integer.<init>(Integer.java:677)[:1.7.0_76]
… 29 more

If you notice, the error message says that our template snippet cannot be converted to a Number.

Why our template snippet is displayed in first instance? The templating engine should have done its part of the job and give us back a static String without any reference to templating directives!

I have showed you this error on purpose, to insist on a concept I have described above but that might get uncaught on first instance.

MVEL support in Fabric, is implemented as an url handler.

So far we have just modified the content of a static resource file, but we haven’t given any hint to Fabric that we’d like to handle that file as a mvel template.

How to do that?

It’s just a matter of using the correct uri to refer to that same file.

So, modify the file default.profile/org.ops4j.pax.web.properties that is the place in the default Fabric Profile where you define which static file contains Jetty configuration:

# change it from org.ops4j.pax.web.config.url=profile:jetty.xml to
org.ops4j.pax.web.config.url=mvel:profile:jetty.xml

Now, stop the instance again, remove the Fabric configuration files, recreate a Fabric and notice how you Jetty instance is running correctly.

We can check it in this way:

JBossFuse:karaf@root> config:list | grep org.osgi.service.http.port
   org.osgi.service.http.port = 8181

While from your browser you can verify, that Hawtio, JBoss Fuse web console that is deployed on top Jetty, is accessible to port 8191: http://localhost:8191/hawtio

Tuesday, February 17, 2015

JBoss Fuse - Some less known trick

TL;DR

  1. expose java static calls as Karaf shell native commands
  2. override OSGi Headers at deploy time
  3. override OSGi Headers after deploy time with OSGi Fragments

Expose java static calls as Karaf shell native commands

As part of my job as software engineer that has to collaborate with support guys and customers, I very often find myself in the need of extracting additional information from a system I don't have access to.
Usual approaches, valid in all kind of softwares, are usually extracting logs, invoking interactive commands to obtain specific outputs or in what is the most complex case deploy some PoC unit that is supposed to verify a specific behavior.

JBoss Fuse, adn Karaf, the platform it's based onto do alredy a great job in exposing all those data.

You have:

  • extensive logs and integration with Log4j
  • extensive list of jmx operation (you can eventually invoke over http with jolokia)
  • a large list of shell commands

But sometimes this is not enough. If you have seen my previous post about how to use Byteman on JBoss Fuse, you can imagine all the other cases:

  1. you need to print values that are not logged or returned in the code
  2. you might need to short-circuit some logic to hit a specific execution branch of your code
  3. you want to inject a line of code that wasn't there at all

Byteman is still a very good option to, but Karaf has a facility we can use to run custom code.

Karaf, allows you to write code directly in its shell; and allows you to record these bits of code as macro you can re-invoke. This macro will look like a native Karaf shell command!

Let's see a real example I had to implement:

verify if the jvm running my JBoss Fuse instance was resolving a specific DNS as expected.

The standard JDK has a method you can invoke to resolve a dns name:
InetAddress.gettAllByName(String)

Since that command is simple enough, meaning it doesn't requires a complex or structured input, I thought I could turn it into an easy to reuse command:

# add all public static methods on a java class as commands  to the namespace "my_context": 
# bundle 0 is because system libs are served by that bundle classloader
addcommand my_context (($.context bundle 0) loadClass java.net.InetAddress) 

That funky line is explained in this way:

  • addcommand is the karaf shell functionality that accepts new commands
  • my_context is the namespace/prefix you will attach you command to. In my case, "dns" would have made a good namespace. ($.context bundle 0) invokes java code. In particular we are invoking the $.context instances, that is a built-in instance exposed by Karaf shell to expose the OSGi framework, whose type is org.apache.felix.framework.BundleContextImpl, and we are invoking its method called bundle passing it the argument 0 representing the id of the OSGi classloader responsible to load the JDK classes. That call returns an instance of org.apache.felix.framework.Felix that we can use to load the specific class definition we need, that is java.net.InetAddress.

As the inline comment says, an invocation of addcommand, exposes all the public static method on that class. So we are now allowed to invoke those methods, and in particular, the one that can resolve dns entries:

JBossFuse:karaf@root> my_context:getAllByName "www.google.com"
www.google.com/74.125.232.146
www.google.com/74.125.232.145
www.google.com/74.125.232.148
www.google.com/74.125.232.144
www.google.com/74.125.232.147
www.google.com/2a00:1450:4002:804:0:0:0:1014

This functionality is described on Karaf documentation page.

Override OSGi Headers at deploy time

If you work with Karaf, you are working with OSGi, love it or hate it.
A typical step in each OSGi workflow is playing (or fighting) with OSGi headers.
If you are in total control of you project, this might be more or less easy, depending on the releationship between your deployment units. See Christian Posta post to have a glimpse of some less than obvious example.

Within those conditions, a very typical situation is the one when you have to use a bundle, yours or someone else's, and that bundle headers are not correct.
What you end up doing, very often is to re-package that bundles, so that you can alter the content of its MANIFEST, to add the OSGi headers that you need.

Karaf has a facility in this regard, called the wrap protocol.
You might alredy know it as a shortcut way to deploy a non-bundle jar on Karaf but it's actually more than just that.
What it really does, as the name suggest, is to wrap. But it can wrap both non-bundles and bundles!
Meaning that we can also use it to alter the metadata of an already packaged bundle we are about to install.

Let's give an example, again taken fron a real life experience.
Apache HttpClient is not totally OSGi friendly. We can install it on Karaf with the wrap: protocol and export all its packages.

JBossFuse:karaf@root> install -s 'mvn:org.apache.httpcomponents/httpclient/4.2.5'
Bundle ID: 257
JBossFuse:karaf@root> exports | grep -i 257
   257 No active exported packages. This command only works on started bundles, use osgi:headers instead
JBossFuse:karaf@root> install -s 'wrap:mvn:org.apache.httpcomponents/httpclient/\ 4.2.5$Export-Package=*; version=4.2.5'
Bundle ID: 259
JBossFuse:karaf@root> exports | grep -i 259
   259 org.apache.http.client.entity; version=4.2.5
   259 org.apache.http.conn.scheme; version=4.2.5
   259 org.apache.http.conn.params; version=4.2.5
   259 org.apache.http.cookie.params; version=4.2.5
...

And we can see that it works with plain bundles too:

JBossFuse:karaf@root> la -l | grep -i camel-core
[ 142] [Active     ] [            ] [       ] [   50] mvn:org.apache.camel/camel-core/2.12.0.redhat-610379
JBossFuse:karaf@root> install -s 'wrap:mvn:org.apache.camel/camel-core/2.12.0.redhat-610379\
$overwrite=merge&Bundle-SymbolicName=paolo-s-hack&Export-Package=*; version=1.0.1'
Bundle ID: 269

JBossFuse:karaf@root> headers 269

camel-core (269)
----------------
...

Bundle-Vendor = Red Hat, Inc.
Bundle-Activator = org.apache.camel.impl.osgi.Activator
Bundle-Name = camel-core
Bundle-DocURL = http://redhat.com
Bundle-Description = The Core Camel Java DSL based router

Bundle-SymbolicName = paolo-s-hack

Bundle-Version = 2.12.0.redhat-610379
Bundle-License = http://www.apache.org/licenses/LICENSE-2.0.txt
Bundle-ManifestVersion = 2

...

Export-Package = 
    org.apache.camel.fabric;
        uses:="org.apache.camel.util,
            org.apache.camel.model,
            org.apache.camel,
            org.apache.camel.processor,
            org.apache.camel.api.management,
            org.apache.camel.support,
            org.apache.camel.spi";
        version=1.0.1,

...

Where you can see Bundle-SymbolicName and the version of the exported packages are carrying the values I set.

Again, the functionality is described on Karaf docs and you might find useful the wrap protocol reference.

Override OSGi Headers after deploy time with OSGi Fragments

Last trick is powerful, but it probably requires you to remove the original bundle if you don't want to risk having half of the classes exposed by one classloader and the remaining ones (those packages you might have added in the overridden Export) in another one.

There is actually a better way to override OSGi headers, and it comes directly from an OSGi standard functionality: OSGi Fragments.

If you are not familiare with the concept, the definition taken directly from OSGi wiki is:

A Bundle fragment, or simply a fragment, is a bundle whose contents are made available to another bundle (the fragment host). Importantly, fragments share the classloader of their parent bundle.

That page gives also a further hint about what I will describe:

Sometimes, fragments are used to 'patch' existing bundles.

We can use this strategy to:

  • inject .jars in the classpath of our target bundle
  • alter headers of our target bundle

I have used the first case to fix a badly configured bundle that was looking for a an xml configuration descriptor that it didn't include, and that I have provided deploying a light Fragment Bundle that contained just that.

But the use case I want to show you here instead, is an improvement regarding the way to deploy Byteman on JBoss Fuse/Karaf.

If you remember my previous post, since Byteman classes needed to be available from every other deployed bundle and potentially need access to every class available, we had to add Byteman packages to the org.osgi.framework.bootdelegation property, that instructs the OSGi Framework to expose the listed packages through the virtual system bundle (id = 0).

You can verify what is currently serving with headers 0, I won't include the output here since it's a long list of jdk extension and framework classes.

If you add your packages, org.jboss.byteman.rule,org.jboss.byteman.rule.exception in my case, even these packages will be listed in the output of that command.

The problem with this solution is that this is a boot time property. If you want to use Byteman to manipulate the bytecode of an already running instance, you have to restart it after you have edited this properties.

OSGi Fragments can help here, and avoid a preconfiguration at boot time.

We can build a custom empty bundle, with no real content, that attaches to the system bundle and extends the list of packages it serves.

<Export-Package>
    org.jboss.byteman.rule,org.jboss.byteman.rule.exception
</Export-Package>
<Fragment-Host>
    system.bundle; extension:=framework
</Fragment-Host>

That's an excerpt of maven-bundle-plugin plugin configuration, see here for the full working Maven project, despite the project it's really just 30 lines of pom.xml:

JBossFuse:karaf@root> install -s mvn:test/byteman-fragment/1.0-SNAPSHOT

Once you have that configuration, you are ready to use Byteman, to, for example, inject a line in java.lang.String default constructor.

# find your Fuse process id
PROCESS_ID=$(ps aux | grep karaf | grep -v grep | cut -d ' ' -f2)

# navigate to the folder where you have extracted Byteman
cd /data/software/redhat/utils/byteman/byteman-download-2.2.0.1/

# export Byteman env variable:
export BYTEMAN_HOME=$(pwd)
cd bin/

# attach Byteman to Fabric8 process, no output expected unless you enable those verbose flags
sh bminstall.sh -b -Dorg.jboss.byteman.transform.all $PROCESS_ID 
# add these flags if you have any kind of problem and what to see what's going on: -Dorg.jboss.byteman.debug -Dorg.jboss.byteman.verbose

# install our Byteman custom rule, we are passing it directly inline with some bash trick
sh bmsubmit.sh /dev/stdin <<OPTS

# smoke test rule that uses also a custom output file
RULE DNS StringSmokeTest
CLASS java.lang.String
METHOD <init>()
AT ENTRY
IF TRUE
DO traceln(" works: " );
traceOpen("PAOLO", "/tmp/byteman.txt");
traceln("PAOLO", " works in files too " );
traceClose("PAOLO");
ENDRULE

OPTS

Now, to verify that Byteman is working, we can just invoke java.lang.String constructor in Karaf shell:

JBossFuse:karaf@root> new java.lang.String
 works: 

And as per our rule, you will also see the content in /tmp/byteman.txt

Inspiration for this third trick come from both the OSGi wiki and this interesting page from Spring guys.

If you have any comment or any other interesting workflow please leave a comment.

Tuesday, October 7, 2014

Use Byteman in JBoss Fuse / Fabric8 / Karaf

Have you ever found yourself in the process of try to understand how come something very simple is not working?

You are writing code in any well known context and for whatever reason it's not working. And you trust your platform, so you carefully read all the logs that you have.
And still you have no clue why something is not behaving like expected.

Usually, what I do next, if I am lucky enough to be working on an Open Source project, is start reading the code.
That many times works; but almost always you haven't written that code; and you don't know the product that well. So, yeah, you see which variable are in the context. You have no clue about their possible values and what's worse you have no idea where or even worse, when, those values were created.

At this point, what I usually do is to connect with a debugger. I will never remember the JVM parameters a java process needs to allow debugging, but I know that I have those written somewhere. And modern IDEs suggest me those, so it's not a big pain connecting remotely to a complex application server.

Okay, we are connected. We can place a breakpoint not far from the section we consider important and step trough the code. Eventually adding more brakpoint.
The IDE variables view allows us to see the values of the variables in contexts. We can even browse the whole object tree and invoke snippet of code, useful in case the plain memory state of an object doesn't really gives the precise information that we need(imagine you want to format a Date or filter a collection).

We have all the instruments but... this is a slow process.
Each time I stop at a specific breakpoint I have to manually browse the variables. I know, we can improve the situation with watched variables, that stick on top of the overview window and give you a quick look at what you have already identified as important.
But I personally find that watches makes sense only if you have a very small set of variables: since they all share the same namespace, you end up with many values unset that just distract the eye, when you are not in a scope that sees those variables.

I have recently learnt a trick to improve these workflows that I want to share with you in case you don't know it yet:

IntelliJ and, with a smart trick even Eclipse, allow you to add print statements when you pass through a breakpoint. If you combine this with preventing the breakpoint to pause, you have a nice way to augment the code you are debugging with log invocations.

For IntelliJ check here: http://www.jetbrains.com/idea/webhelp/enabling-disabling-and-removing-breakpoints.html

While instead for Eclipse, check this trick: http://moi.vonos.net/2013/10/adhoc-logging/ or let me know if there is a cleaner or newer way to reach the same result.

The trick above works. But it's main drawback is that you are adding a local configuration to your workspace. You cannot share this easily with someone else. And you might want to re-use your workspace for some other session and seeing all those log entries or breakpoints can distract you.

So while looking for something external respect my IDE, I have decided to give Byteman a try.

Byteman actually offers much more than what I needed this time and that's probably the main reason I have decided to understand if I could use it with Fabric8.

A quick recap of what Byteman does taken directly from its documentation:

Byteman is a bytecode manipulation tool which makes it simple to change the operation of Java applications either at load time or while the application is running.
It works without the need to rewrite or recompile the original program.

Offers:

  • tracing execution of specific code paths and displaying application or JVM state
  • subverting normal execution by changing state, making unscheduled method calls or forcing an unexpected return or throw
  • orchestrating the timing of activities performed by independent application threads
  • monitoring and gathering statistics summarising application and JVM operation

In my specific case I am going to use the first of those listed behaviors, but you can easily guess that all the other aspects might become handy at somepoint:

  • add some logic to prevent a NullPointerException
  • shortcircuit some logic because you are hitting a bug that is not in your code base but you still want to see what happens if that bug wasn't there
  • anything else you can imagine...

Start using Byteman is normally particularly easy. You are not even forced to start your jvm with specific instruction. You can just attach to an already running process!
This works most of the time but unluckily not on Karaf with default configuration, since OSGi implication. But no worries, the functionality is just a simple configuration editing far.

You have to edit the file :

$KARAF_HOME/etc/config.properties

and add this 2 packages to the proprerty org.osgi.framework.bootdelegation:

org.jboss.byteman.rule,org.jboss.byteman.rule.exception

That property is used to instruct the osgi framework to provide the classes in those packages from the parent Classloader. See http://felix.apache.org/site/apache-felix-framework-configuration-properties.html

In this way, you will avoid ClassCastException raised when your Byteman rules are triggered.

That's pretty much all the extra work we needed to use Byteman on Fuse.

Here a practical example of my interaction with the platform:

# assume you have modified Fabric8's config.properties and started it and that you are using fabric8-karaf-1.2.0-SNAPSHOT

# find your Fabric8 process id
$ ps aux | grep karaf | grep -v grep | cut -d ' ' -f3
5200

# navigate to the folder where you have extracted Byteman
cd /data/software/redhat/utils/byteman/byteman-download-2.2.0.1/
# export Byteman env variable:
export BYTEMAN_HOME=$(pwd)
cd bin/
# attach Byteman to Fabric8 process, no output expected unless you enable those verbose flags
sh bminstall.sh 5200 # add this flags if you have any kind of problem and what to see what's going on: -Dorg.jboss.byteman.debug -Dorg.jboss.byteman.verbose 
# install our Byteman custom rules
$ sh bmsubmit.sh ~/Desktop/RBAC_Logging.btm
install rule RBAC HanldeInvoke
install rule RBAC RequiredRoles
install rule RBAC CanBypass
install rule RBAC UserHasRole
# invoke some operation on Fabric8 to trigger our rules:
$ curl -u admin:admin 'http://localhost:8181/jolokia/exec/io.fabric8:type=Fabric/containersForVersion(java.lang.String)/1.0' 
{"timestamp":1412689553,"status":200,"request":{"operation...... very long response}

# and now check your Fabric8 shell:
 OBJECT: io.fabric8:type=Fabric
 METHOD: containersForVersion
 ARGS: [1.0]
 CANBYPASS: false
 REQUIRED ROLES: [viewer, admin]
 CURRENT_USER_HAS_ROLE(viewer): true

Where my Byteman rules look like:

RULE RBAC HanldeInvoke
CLASS org.apache.karaf.management.KarafMBeanServerGuard
METHOD handleInvoke(ObjectName, String, Object[], String[]) 
AT ENTRY
IF TRUE
DO traceln(" OBJECT: " + $objectName + "
 METHOD: " + $operationName + "
 ARGS: " + java.util.Arrays.toString($params) );
ENDRULE

RULE RBAC RequiredRoles
CLASS org.apache.karaf.management.KarafMBeanServerGuard
METHOD getRequiredRoles(ObjectName, String, Object[], String[])
AT EXIT
IF TRUE
DO traceln(" REQUIRED ROLES: " + $! );
ENDRULE

RULE RBAC CanBypass
CLASS org.apache.karaf.management.KarafMBeanServerGuard
METHOD canBypassRBAC(ObjectName) 
AT EXIT
IF TRUE
DO traceln(" CANBYPASS: " + $! );
ENDRULE

RULE RBAC UserHasRole
CLASS org.apache.karaf.management.KarafMBeanServerGuard
METHOD currentUserHasRole(String)
AT EXIT
IF TRUE
DO traceln(" CURRENT_USER_HAS_ROLE(" + $requestedRole + "): " + $! );
ENDRULE

Obviously ths was just a short example of what Byteman can do for you. I'd invite you to read the project documentation since you might discover nice constructs that could allow you to write easier rules or to refine them to really trigger only when it's relevant for you (if in my example you see some noise in the output, you probably have an Hawtio instance open that is doing it's polling thus triggering some of our installed rules)

A special thank you goes to Andrew Dinn that explained me how Byteman work and the reason of my initial failures

The screencast is less than optimal due to my errors ;) but you clearly see the added noise since I had an Hawt.io instance invoking protected JMX operation!