Tuesday, September 26, 2017

Ansible - A handy tool for people that might not need it

tl;dr

Ansible is a good tool. You can do what it gives you in many different ways, and you might be already doing it. With this assumption, I’ll try to show you why you might like it.

Ansible and I

I have recently passed Red Hat Certificate of Expertise in Ansible Automation.

I have taken the exam because I felt that I didn’t know much about Ansible and since the topic was hot with new modules and integration popping out on a daily basis.
Each time I can find a way to match a new learning opportunity with the possibility to verify my understanding of a topic against a Red Hat formal exam, I take a chance. That’s how I got that nice t-shirt when I got my RHCA =P

What’s Ansible?

Ansible is a configuration management and deployment devops tool.

Let’s break this apart, starting from the end: I haven’t thrown DevOps here just as a catchy buzzword, but I cited it just to give you a context. We are in the DevOps world here. Somewhere in between system administration, operations and developers’ deliverables.
And we are dealing with configuration management, meaning that we probably have quite a large matrix of specific configuration that has to match with an equivalently complex matrix of the environment to apply it to.
And it seems that we are allowed to use the tool also to PUT deliverables and configuration on the target machines we are interested to manage.

So far so good.

Just to check if you are still with me, I’ll challenge you with a provocative question:

Tell me the name of a technology, out there since forever, that you have used to perform the same task in the past.

I’m not pretending to guess everyone’s answer, but my hope is that a large portion of readers would reply with any variation of this: scripts or bash.

If you also replied that, we are definitely on the same page and I can assure that I’ll try to keep this comparison in mind in this blog post, hoping that this can be a useful way to tackle this topic.

Alternatives

If you work in IT, at any level probably, you must have some familiarity with any of the alternative tools to perform a similar job.
The usual suspects are Chef, Puppet, Salt and bash/scripts for Configuration Management and Capistrano, Fabric and bash/scripts for Deployment.
Again, I’m stressing out the custom scripts part here. Because, besides the fact that anything that brings with it the custom adjective in the name, is a possible code smell in an industry that aims to improve via standardization, it’s also implicitly suggesting that if your software management model is mature enough to give you everything that you need, you are probably already where you want to be.

A specific characteristic that distinguishes Ansible from most of its alternatives is the fact that it has a agentless architecture.
This means that you won’t have a daemon or equivalent process, always running on the managed system, listening for a command from a central server to perform any operation.

But… you will have to rely on a daemon, always running, to allow your managed node to perform the operations you want.
Is this a fraud?
No, but I like to draw out the pros and cons of things, without cheating and hiding behind too good to be true marketing claims.
For Ansible, to be able to work in its agentless way, you have to rely on an SSHD (or equivalent, when you are outside *NIX*) daemon to allow remote connections.
So it’s not a fraud. SSHD always running process is often found in many systems, even when you don’t use Ansible to manage them. But at the same time you cannot say that if you don’t have an agent running on a managed node, you don’t have anything else running in its place!

Is this all? Ansible press says yes, I personally say no.

Uh? The story out there uses to cite the refrain that SSH it’s the only thing you need to run Ansible. I feel that this is not completely correct, and if you take it as an absolute assertion it’s just plain wrong:

  1. Besides sshd, Ansible needs python installed on the managed host. This might be overlooked if you manage just nodes based on modern Linux distros. But in recent times, when you might find yourself managing lower level devices like IoT ones, that try to reduce the number of packages installed, you might end up without python. In those cases, Ansible wouldn’t work.

  2. In the specific case of local deployment, you don’t even need SSHD! I got this is a specific use case, but I think it’s an important one. I might want to use Ansible as a full replacement on a large bash script for example. Bash doesn’t assume ssh. It’s mainly a collection of command invocation tied together with execution logic. Ansible can be that for you. And if that’s the case, you don’t even need sshd running locally, since it’s implicit support for localhost allows you to run Ansible script anyway.

Why python?

This is an important question to understand how Ansible operates:

Ansible allows you to express scripts, that it calls playbooks, written with a custom YAML based syntax/dialect.

Why python then?
Because the playbook written in YAML is not the artifact that it’s going to be run. It’s just used as an intermediate artifact that is preprocessed by Ansible toolkit, to produce a fully valid python program, performing the instructions expressed in the higher level YAML syntax.
This python program is then copied to the managed node, and run. You now get how come python was a strict requirement for the managed nodes.

The DSL

Ansible uses a YAML based DSL for its script.
This is one of those points where I have mixed feelings: in general, I’m pro Domain Specific Languages. The expectation is that they are going to offer facilities at language level to perform the most common tasks in the Domain in an easier way. The main reason for my mixed feelings is that I have quite a bad memory for language syntaxes. I can make things working with a sample to look at or the internet, but I’m never sure what’s the language-specific convention for functions, parenthesis and such.

In the case of Ansible, and considering that the main alternative is bash that is not really intuitive for anything that relates to formatting, semicolons, control structures and string manipulation, I can honestly say that the DSL idea works well.

This is the skeleton of an Ansible playbook:

---
- name: My Playbook
  hosts: host1,host2
  tasks:
  - name: say hello
    debug:
      msg: "Hello world"
  - name: copy file
    copy:
      src: /path/to/source
      dest: /path/to/dest
...

And I swear I have typed it without copying it.
As you can see, is reasonably intuitive.
As you can imagine someone needs to tell you the list of allowed keywords that you can use. And you have to know what’s mandatory or optional, but you can probably read it and start setting your expectations.

If you are like me (or Alan Kay), you are probably mumbling already that yeah, Ansible looks simple to perform simple tasks, but I haven’t proved how to fit it is for advanced stuff. Just be patient for a little longer, since I’ll come back on this later.

The structure of the sample script above covers a large part of what you need to know about Ansible. There are 2 key elements:

  • hosts
  • tasks

Not surprisingly you have a list of hosts. You might have written a similar bash script with hostnames externalized in their own variable, but these are indeed more complex than it looks. They are actually not a plain list of addresses, but they are keys in the dictionary that Ansible uses to keep track of managed hosts. You may wonder why all this complexity?

There are 2 main reasons:

  • The first one is that this way, having a complex object/dictionary tied to each managed host, you can also attach a lot of other data and metadata to the entry. You can have a place to specify users, connection parameters and such; up to custom variables, you want to assign a special value just to the specific node.
  • The second one instead, is less straightforward but actually harder to implement if things weren’t broken down this way: this decoupled mechanism, allows you to plug in easily dynamic data for the collection of hosts, that Ansible calls inventory.
    In ansible, an inventory file can be a static file with the lists of hosts and their properties, or it can be any binary file, that when invoked with special parameters defined by a service contract, returns a JSON object containing the same information you can have with a static file. If this seems overkill to you, just try to think about how things are less static nowadays with virtualization and cloud environments.

Now the tasks part.
This is the easy part because something else encapsulates the harder aspects.
tasks is just a list of individual operation to perform.
As simple as that.
The operations are provided to you by a large set of standard library modules. In our example, we use 2 very simple ones, debug used just to print out strings, and copy that as you can guess, just copies a file.
This is probably the aspect where Ansible shines the most: it has a massive set of off the shelf components that you can start using just passing them the required configuration.
The list here is really long. You can check it yourself with ansible-doc a companion CLI tool, that is your best friend, even more than Google probably when you are working on a script:
ansible-doc -l gives you the full list of modules currently supported by Ansible, from commands to configure enterprise load balancers to other to configure and deploy AWS Lambdas. Literally everything for everyone!

ansible-doc is super useful because the style of the documentation it allows you to browse has been kept the light on purpose, with a large focus on the examples.

For example, if you inspect the documentation of the get_url module, and jump to the EXAMPLES section, you’ll probably find the snippet to start customizing the command for your specific needs:
ex.

ansible-doc copy
...
EXAMPLES:
- name: download foo.conf
  get_url:
    url: http://example.com/path/file.conf
    dest: /etc/foo.conf
    mode: 0440
...

A large standard library + a light inline documentation is most likely something your custom bash scripts cannot offer you.

Up until this point, if you are skeptics like me, you would probably have told me that you appreciate it but that other than a particularly large module library and excellent documentation you don’t see any compelling reason to move my consolidated practices to a new tool.

And I agree.

So I’ll weight in some further feature.

ansible-playbook, that is the CLI command you pass your scripts to be invoked, has some built-in facility, that once again, is really handy to the person that is writing scripts and has to debug it.

For example, you can pass it a flag where the Ansible prompt will ask you if you really want to run a specific task or if you want to skip it. (--step).
Otherwise, you might want to use a flag to provide the name of the specific task on your list to start the execution from: --start-at-task=.
Or you might want to run only tasks tagged with a specific marker: --tags=.
You even have a flag to run the script in test mode, so that no real operation is really invoked but, when possible and it makes sense, get back an output that would tell you what effect your command invocation would have generated on the remote system!

These (and many others I don’t have time to list here) are definitely nice facilities to anyone that has to write a script, in any language.

Another important feature/facility that it’s important to underline is the fact that Ansible modules operations are oriented to be idempotent.
This means that they are written in such a way that if you perform the same exact operation more than once, it will leave the system in the same state as after the first invocation.

This is probably better explained with an example:
Imagine that the operation you have to perform is to put a specific line in a file.
A naive way to obtain this in bash is with concatenation. If I’d just use the >> operator, I’m sure I will always add my entry to that file.
But what happens if the entry was already there? If concatenate a second time, I would add a second entry. And this is unlikely what I need to do most of the time.
To correctly solve my issue, I should append the entry only if it wasn’t there before.
It’s still not super complex, but I definitely have to keep more things in mind.

Ansible modules try to encapsulate this “idempotency” responsibility: you just declare the line of text you want in a file. It’s up to the module verify that it won’t be added a second time if already present.

Another way to explain this approach is that Ansible syntax is declarative. You just need to define the final state of the system you are interacting with, not really the complete logic performed to get to that state.

This nifty feature allows you to run the same script against the same node more than once, without the risk of ruining the system configuration. Which allows you to avoid disrupting a healthy system if a script is run for twice by error, but also to speed up things a lot in many cases: imagine a complex script that downloads files, extract them, install and so on… If a system is already in the final state declared by your script, you can avoid performing redundant steps!

Idempotency is also important to understand ansible-playbook output.

At the end of an invocation of a script, you’ll get a recap similar to this:

PLAY RECAP ****************************************************************************
host1                  : ok=1    changed=1    unreachable=0    failed=0   

This summary recaps what happened in the scripts. You also have the full transcript in the above lines I’m not showing here, but I want to focus on the recap.

The recap is strictly tied to the idempotency idea: the number of tasks in the ok column is those that HAVE NOT altered the system since it was already in the desired state!
Those that have instead performed some work are listed in the changed column.

Some built-in construct like handlers and also the logic you might want to use in your scripts are built around the concept of changed: do other stuff only if you a specific task has changed the system, otherwise probably don’t need to (since the system hasn’t changed).

Advanced stuff

Declaimer: this is not really advanced stuff. But it’s that kind of stuff, that in bash you often rely on documentation or examples to verify how it’s performed.

Let’s start with the setup module:
Ansible default behavior, unless you ask to turn it off, is to run, as the very first module that you don’t have to define explicitly, a discovery command called setup. This command gathers a lot of useful information regarding the system you are connecting to, and use them to populate a set of variables you can use in your playbook.
It’s a handy way to have a quick access to information like IP addresses, memory availability and many other data or metrics you might base your logic onto.

What about loops?.

Bash has that bizarre rule of carriage return vs. semicolon, that makes me unable to remember how to write one without checking the internet.

In Ansible is simpler:

...
- name: sample loop, using debug but works with any module
  debug:
    msg: {{ item }}
  with_items:
  - "myFirstItem"
  - "mySecondItem"
...

There is a collection of with_* keywords you can use to iterate over specific objects, in our case a plain YAML list, but it could be a dictionary, a list of files names with support for GLOB and many others.

Conditional flow
There is strict equivalent for an if/else logic. At least not to create a traditional execution tree.
The patter here is to push you to keep the execution logic linear, with just a boolean flag that enables or disable the specific task.

And it’s as simple as writing:

...
when: true
...

And you guess right if you think that any expression that evaluates to true can specified there, something like: ansible_os_family == "RedHat" and ansible_lsb.major_release|int >= 6.

Don’t be disappointed by my comment on the fact that Ansible doesn’t suggest you have nested if/else logic. You can still get that with the combination of other keywords like block and when. But the point is to try to keep the execution flow linear, thus simpler.

The last advanced feature I want to cover is Ansible templates and string interpolation.

The tool has an embedded mechanism to allow you to build dynamic strings. This allows you to define a basic skeleton for a text where you define some placeholders, that are automatically replaced by the values of the corresponding variables that are present in the running context.

For example:

debug:
  msg: "Hello {{ inventory_hostname }}"

The same feature is also available in the form of template module: you can define an external file, containing your template, that Ansible will process applying values for the tokens it finds, and in a single pass, copy it to the remote system. This, is an ideal use case for configuration, for example:

template:
  src: /mytemplates/foo.j2
  dest: /etc/file.conf
  owner: bin
  mode: 0644

I decide to stop my overview here, despite Ansible offers you many more features you expect from a DevOps tool:

and many others!

My hope is that this article could trigger your curiosity to explore the topic on your own, having shown that despite you are probably able to obtain the same effect with other technologies, Ansible proposition is interesting and might deserve a deeper look.

Thursday, June 29, 2017

OAuth2, JWT, Open-ID Connect and other confusing things

Disclaimer

If feel I have to start this post with an important disclaimer: don’t trust too much what I’m about to say.
The reason why I say this is because we are discussing security. And when you talk about security anything other then 100% correct statements risks to expose you to some risk of any sort.
So, please, read this article keeping in mind that your source of truth should be the official specifications, and that this is just an overview that I use to recap this topic in my own head and to introduce it to beginners.

Mission

I have decided to write this post because I have always found OAuth2 confusing. Even now that I know a little more about it, I found some of its part puzzling.
Even if I was able to follow online tutorials from the likes of Google or Pinterest when I need to fiddle with their APIs, it always felt like some sort of voodoo, with all those codes and Bearer tokens.
And each time they mentioned I could make my own decisions for specific steps, choosing among the standard OAuth2 approach, my mind tended to go blind.

I hope I’ll be able to fix some idea, so that from now on, you will be able to follow OAuth2 tutorials with more confidence.

What is OAuth2?

Let’s start from the definition:

OAuth 2 is an authorisation framework that enables applications to obtain limited access to user accounts on an HTTP service.

The above sentence is reasonably understandable , but we can improve things if we pinpoint the chose terms.

The Auth part of the name, reveals itself to be Authorisation(it could have been Authentication; it’s not).
Framework can be easily overlooked since the term framework is often abused; but the idea to keep here is that it’s not necessarily a final product or something entirely defined. It’s a toolset. A collection of ideas, approaches, well defined interactions that you can use to build something on top of it!
It enable applications to obtain limited access. The key here is that it enables applications not humans.
limited access to user accounts is probably the key part of the definition that can help you to remember and to explain what OAuth2 is:
the main aim is to allow a user to delegate access to a user owned resource. Delegating it to an application.

OAuth2 is about delegation.

It’s about a human, instructing a software to do something on her behalf.
The definition also mentions limited access, so you can imagine of being able to delegate just part of your capabilities.
And it concludes mentioning HTTP services. This authorisation-delegation, happens on an HTTP service.

Delegation before OAuth2

Now that the context should be clearer, we could ask ourselves: How were things done before OAuth2 and similar concepts came out?

Well, most of the time, it was as bad as you can guess: with a shared secret.

If I wanted a software A to be granted access to my stuff on server B, most of the time the approach was to give my user/pass to software A, so that it could use it on my behalf.
This is still a pattern you can see in many modern software, and I personally hope it’s something that makes you uncomfortable.
You know what they say: if you share a secret, it’s no longer a secret!

Now imagine if you could instead create a new admin/password couple for each service you need to share something with. Let’s call them ad-hoc passwords.
They are something different than your main account for a specific service but they still allow to access the same service as they were you. You would be able, in this case, to delegate, but you would still be responsible of keeping track of all this new application-only accounts you need to create.

OAuth2 - Idea

Keeping in mind that the business problem that we are trying to solve is the “delegation” one, we want to extend the ad-hoc password idea to take away from the user the burden of managing these ad-hoc passwords.
OAuth2 calls these ad-hoc passwords tokens.
Tokens, are actually more than that, and I’ll try to illustrate it, but it might be useful to associate them to this simpler idea of an ad-hoc password to begin with.

OAuth2 - Core Business

Oauth 2 Core Business is about:

  • how to obtain tokens

OAuth2 - What’s a token?

Since everything seems to focus around tokens, what’s a token?
We have already used the analogy of the ad-hoc password, that served us well so far, but maybe we can do better.
What if we look for the answer inside OAuth2 specs?
Well, prepare to be disappointed. OAuth2 specs do not give you the details of how to define a token. Why is this even possible?
Remember when we said that OAuth2 was “just a framework”? Well, this is one of those situation where that definition matters!
Specs just tell you the logical definition of what a token is and describe some of the capabilities it needs to posses.
But at the end, what specs say is that a token is a string. A string containing credentials to access a resource.
It gives some more detail, but it can be said that most of the time, it’s not really important what’s in a token. As long as the application is able to consume them.

A token is that thing, that allows an application to access the resource you are interested into.

To point out how you can avoid to overthink what a token is, specs also explicitly say that “is usually opaque to the client”!
They are practically telling you that you are not even required to understand them!
Less things to keep in mind, doesn’t sound bad!

But to avoid turning this into a pure philosophy lesson, let’s show what a token could be

{
   "access_token": "363tghjkiu6trfghjuytkyen",
   "token_type": "Bearer"
}

A quick glimpse show us that, yeah, it’s a string. JSON-like, but that’s probably just because json is popular recently, not necessarily a requirement.
We can spot a section with what looks like a random string, an id: 363tghjkiu6trfghjuytkyen. Programmers know that when you see something like this, at least when the string is not too long, it’s probably a sign that it’s just a key that you can correlate with more detailed information, stored somewhere else.
And that iss true also in this case.
More specifically, the additional information it will be the details about the specific authorisation that that code is representing.

But then another thing should capture your attention: "token_type": "Bearer".

Your reasonable questions should be: what are the characteristics of a Bearer token type? Are there other types? Which ones?

Luckily for our efforts to keep things simple, the answer is easy ( some may say, so easy to be confusing… )

Specs only talk about Bearer token type!

Uh, so why the person who designed a token this way, felt that he had to specify the only known value?
You might start seeing a pattern here: because OAuth2 is just a framework!
It suggests you how to do things, and it does some of the heavy lifting for you making some choice, but at the end, you are responsible of using the framework to build what you want.
We are just saying that, despite here we only talk about Bearer tokens, it doesn’t mean that you can’t define your custom type, with a meaning you are allowed to attribute to it.

Okay, just a single type. But that is a curious name. Does the name imply anything relevant?
Maybe this is a silly question, but for non-native English speakers like me, what Bearer means in this case could be slightly confusing.

Its meaning is quite simple actually:

A Bearer token is something that if you have a valid token, we trust you. No questions asked.

So simple it’s confusing. You might be arguing: “well, all the token-like objects in real world work that way: if I have valid money, you exchange them for the good you sell”.

Correct. That’s a valid example of a Bearer Token.

But not every token is of kind Bearer. A flight ticket, for example, it’s not a Bearer token.
It’s not enough having a ticket to be allowed to board on a plane. You also need to show a valid ID, so that your ticket can be matched with; and if your name matches with the ticket, and your face match with the id card, you are allowed to get on board.

To wrap this up, we are working with a kind of tokens, that if you posses one of them, that’s enough to get access to a resource.

And to keep you thinking: we said that OAuth2 is about delegation. Tokens with this characteristic are clearly handy if you want to pass them to someone to delegate.

A token analogy

Once again, this might be my non-native English speaker background that suggests me to clarify it.
When I look up for the first translation of token in Italian, my first language, I’m pointed to a physical object.
Something like this:

Token

That, specifically, is an old token, used to make phone calls in public telephone booths.
Despite being a Bearer token, its analogy with the OAuth2 tokens is quite poor.
A much better picture has been designed by Tim Bray, in this old post: An Hotel Key is an Access Token
I suggest you to read directly the article, but the main idea, is that compared to the physical metal coin that I have linked first, your software token is something that can have a lifespan, can be disabled remotely and can carry information.

Actors involved

These are our actors:

  • Resource Owner
  • Client (aka Application)
  • Authorisation Server
  • Protected Resource

It should be relatively intuitive: an Application wants to access a Protected Resource owned by a Resource Owner. To do so, it requires a token. Tokens are emitted by an Authorisation Server, which is a third party entity that all the other actors trust.

Usually, when a read something new, I tend to quickly skip through the actors of a system. Probably I shouldn’t, but most of the time, the paragraph that talks describe, for example, a “User”, ends up using many words to just tell me that it’s just, well, a user… So I try to look for the terms that are less intuitive and check if some of them has some own characteristic that I should pay particular attention to.

In OAuth2 specific case, I feel that the actor with the most confusing name is Client.
Why do I say so? Because, in normal life (and in IT), it can mean many different things: a user, a specialised software, a very generic software…

I prefer to classify it in my mind as Application.

Stressing out that the Client is the Application we want to delegate our permissions to. So, if the Application is, for example, a server side web application we access via a browser, the Client is not the user or the browser itself: the client is the web application running in its own environment.

I think this is very important. Client term is all over the place, so my suggestion is not to replace it entirely, but to force your brain to keep in mind the relationship Client = Application.

I also like to think that there is another not official Actor: the User-Agent.

I hope I won’t confuse people here, because this is entirely something that I use to build my mental map.
Despite not being defined in the specs, and also not being present in all the different flows, it can help to identify this fifth Actor in OAuth2 flows.
The User-Agent is most of the time impersonated by the Web Browser. Its responsibility is to enable an indirect propagation of information between 2 systems that are not talking directly each other.
The idea is: A should talk to B, but it’s not allowed to do so. So A tells C (the User-Agent) to tell B something.

It might be still a little confusing at the moment, but I hope I’ll be able to clarify this later.

OAuth2 Core Business 2

OAuth2 is about how to obtain tokens.

Even if you are not an expert on OAuth2, as soon as someone mentions the topic, you might immediately think about those pages from Google or the other major service providers, that pop out when you try to login to a new service on which you don’t have an account yet, and tell Google, that yeah, you trust that service, and that you want to delegate some of your permissions you have on Google to that service.

This is correct, but this is just one of the multiple possibly interactions that OAuth2 defines.

There are 4 main ones it’s important you know. And this might come as a surprise if it’s the first time you hear it:
not all of them will end up showing you the Google-like permissions screen!
That’s because you might want to leverage OAuth2 approach even from a command line tool; maybe even without any UI at all, capable of displaying you an interactive web page to delegate permissions.

Remember once again: the main goal is to obtain tokens!

If you find a way to obtain one, the “how” part, and you are able to use them, you are done.

As we were saying, there are 4 ways defined by the OAuth2 framework. Some times they are called flows, sometimes they are called grants.
It doesn’t really matter how you call them. I personally use flow since it helps me reminding that they differ one from the other for the interactions you have to perform with the different actors to obtain tokens.

They are:

  • Authorisation Code Flow
  • Implicit Grant Flow
  • Client Credential Grant Flow
  • Resource Owner Credentials Grant Flow (aka Password Flow)

Each one of them, is the suggested flow for specific scenarios.
To give you an intuitive example, there are situation where your Client is able to keep a secret(a server side web application) and other where it technically can’t (a client side web application you can entirely inspect it’s code with a browser).
Environmental constraints like the one just described would make insecure ( and useless ) some of the steps defined in the full flow. So, to keep it simpler, other flows have been defined when some of the interactions that were impossible or that were not adding any security related value, have been entirely skipped.

OAuth2 Poster Boy: Authorisation Code Flow

We will start our discussion with Authorisation Code Flow for three reasons:

  • it’s the most famous flow, and the one that you might have already interacted with (it’s the Google-like delegation screen one)
  • it’s the most complex, articulated and inherently secure
  • the other flows are easier to reason about, when compared to this one

The Authorisation Code Flow, is the one you should use if your Client is trusted and is able to keep a secret. This means a server side web application.

How to get a token with Authorisation Code Flow

  1. All the involved Actors trust the Authorisation Server
  2. User(Resource Owner) tells a Client(Application) to do something on his behalf
  3. Client redirects the User to an Authorisation Server, adding some parameters: redirect_uri, response_type=code, scope, client_id
  4. Authorisation Server asks the User if he wishes to grant Client access some resource on his behalf(delegation) with specific permissions(scope).
  5. User accepts the delegation request, so the Auth Server sends now an instruction to the User-Agent(Browser), to redirect to the url of the Client. It also injects a code=xxxxx into this HTTP Redirect instruction.
  6. Client, that has been activated by the User-Agent thanks to the HTTP Redirect, now talks directly to the Authorisation Server (bypassing the User-Agent). client_id, client_secret and code(that it had been forwarded).
  7. Authorisation Server returns the Client (not the browser) a valid access_token and a refresh_token

This is so articulated that it’s also called the OAuth2 dance!

Let’s underline a couple of points:

  • At step 2, we specify, among the other params, a redirect_uri. This is used to implement that indirect communication we anticipated when we have introduced the User-Agent as one of the actors. It’s a key information if we want to allow the Authorisation Server to forward information to the Client without a direct network connection open between the two.
  • the scope mentioned at step 2 is the set of permissions the Client is asking for
  • Remember that this is the flow you use when the client is entirely secured. It’s relevant in this flow at step 5, when the communication between the Client and the Authorisation Server, avoids to pass through the less secure User-Agent (that could sniff or tamper the communication). This is also why, it makes sense that for the Client to enable even more security, that is to send its client_secret, that is shared only between him and the Authorisation Server.
  • The refresh_token is used for subsequent automated calls the Client might need to perform to the Authorisation Server. When the current access_token expires and it needs to get a new one, sending a valid refresh_token allows to avoid asking the User again to confirm the delegation.

OAuth2 Got a token, now what?

OAuth2 is a framework remember. What does the framework tells me to do now?

Well, nothing. =P

It’s up to the Client developer.

She could (and often should):

  • check if token is still valid
  • look up for detailed information about who authorised this token
  • look up what are the permissions associated to that token
  • any other operation that it makes sense to finally give access to a resource

They are all valid, and pretty obvious points, right?
Does the developer have to figure out on her own the best set of operations to perform next?
She definitely can. Otherwise she can leverage another specification: OpenIDConnect(OIDC). More on this later.

OAuth2 - Implicit Grant Flow

It’s the flow designed for Client application that can’t keep a secret. An obvious example are client side HTML applications. But even any binary application whose code is exposed to the public can be manipulated to extract their secrets.
Couldn’t we have re-used the Authorisation Code Flow?
Yes, but… What’s the point of step 5) if secret is not a secure secret anymore? We don’t get any protection from that additional step!
So, Implicit Grant Flow, is just similar to Authorisation Code Flow, but it doesn’t perform that useless step 5.
It aims to obtain directly access_tokens without the intermediate step of obtaining a code first, that will be exchanged together with a secret, to obtain an access_token.

It uses response_type=token to specific which flow to use while contacting the Authorisation Server.
And also that there is no refresh_token. And this is because it’s assumed that user sessions will be short (due to the less secure environment) and that anyhow, the user will still be around to re-confirm his will to delegate(this was the main use case that lead to the definition of refresh_tokens).

OAuth2 - Client Credential Grant Flow

What if we don’t have a Resource Owner or if he’s indistinct from the Client software itself (1:1 relationship) ?
Imagine a backend system that just wants to talk to another backend system. No Users involved.
The main characteristic of such an interaction is that it’s no longer interactive, since we no longer have any user that is asked to confirm his will to delegate something.
It’s also implicitly defining a more secure environment, where you don’t have to be worried about active users risking to read secrets.

Its type is response_type=client_credentials.

We are not detailing it here, just be aware that it exist, and that just like the previous flow, it’s a variation, a simplification actually, of the full OAuth dance, that you are suggested to use if your scenario allows that.

OAuth2 - Resource Owner Credentials Grant Flow (aka Password Flow)

Please raise your attention here, because you are about to be confused.

This is the scenario:
The Resource Owner, has an account on the Authorisation Server. The Resource Owner gives his account details to the Client. The Client use this details to authenticate to the Authorisation Server…

=O

If you have followed through the discussion you might be asking if I’m kidding you.
This is exactly the anti-pattern we tried to move away from at the beginning of our OAuth2 exploration!

How is it possible to find it listed here as possible suggested flow?

The answer is quite reasonable actually: It’s a possible first stop for migration from a legacy system.
And it’s actually a little better than the shared password antipattern:
The password is shared but that is just a mean to start the OAuth Dance used to obtain tokens.

This allows OAuth2 to put its foot into the door, if we don’t have better alternatives.
It introduces the concept of access_tokens, and it can be used until the architecture will be mature enough (or the environment will change) to allow a better and more secure Flow to obtain tokens.
Also, please notice that now tokens are the ad-hoc password that reaches the Protected Resource system, while in the fully shared password antipattern, it was our password that needs to be forwarded.

So, far from ideal, but at least we justified by some criteria.

How to chose the best flow?

There are many decision flow diagrams on the internet. One of those that I like the most is this one:

OAuth2 Flows from https://auth0.com

It should help you to remember the brief description I have gave you here and to chose the easiest flow based on your environment.

OAuth2 Back to tokens - JWT

So, we are able to get tokens now. We have multiple ways to get them. We have not been told explicitly what to do with them, but with some extra effort and a bunch of additional calls to the Authorisation Server we can arrange something and obtain useful information.

Could things be better?

For example, we have assumed so fare that our tokens might look like this:

{
   "access_token": "363tghjkiu6trfghjuytkyen",
   "token_type": "Bearer"
}

Could we have more information in it, so to save us some round-trip to the Authorisation Server?

Something like the following would be better:

{
  "active": true,
  "scope": "scope1 scope2 scope3",
  "client_id": "my-client-1",
  "username": "paolo",
  "iss": "http://keycloak:8080/",
  "exp": 1440538996,
  "roles" : ["admin", "people_manager"],
  "favourite_color": "maroon",
  ... : ...
}

We’d be able to access directly some information tied to the Resource Owner delegation.

Luckily someone else had the same idea, and they came out with JWT - JSON Web Tokens.
JWT is a standard to define the structure of JSON based tokens representing a set of claims. Exactly what we were looking for!

Actually the most important aspect that JWT spec gives us is not in the payload that we have exemplified above, but in the capability to trust the whole token without involving an Authorizatin Server!

How is that even possible? The idea is not a new one: asymmetric signing (pubkey), defined, in the context of JWT by JOSE specs.

Let me refresh this for you:

In asymmetric signing two keys are used to verify the validity of information.
These two keys are coupled, but one is secret, known only to the document creator, while the other is public.
The secret one is used to calculate a fingerprint of the document; an hash.
When the document is sent to destination, the reader uses the public key, associated with the secret one, to verify if the document and the fingerprint he has received are valid.
Digital signing algorithms tell us that the document is valid, according to the public key, only if it’s been signed by the corresponding secret key.

The overall idea is: if our local verification passes, we can be sure that the message has been published by the owner of the secret key, so it’s implicitly trusted.

And back to our tokens use case:

We receive a token. Can we trust this token? We verify the token locally, without the need to contact the issuer. If and only if, the verification based on the trusted public key passes, we confirm that token is valid. No question asked. If the token is valid according to digital signage AND if it’s alive according to its declared lifespan, we can take those information as true and we don’t need to ask for confirmation to the Authorisation Server!

As you can imagine, since we put all this trust in the token, it might be savvy not to emit token with an excessively long lifespan:
someone might have changed his delegation preferences on the Authorisation Server, and that information might not have reached the Client, that still has a valid and signed token it can based its decision onto.
Better to keep things a little more in sync, emitting tokens with a shorter life span, so, eventual outdated preferences don’t risk to be trusted for long periods.

OpenID Connect

I hope this section won’t disappoint you, but the article was already long and dense with information, so I’ll keep it short on purpose.

OAuth2 + JWT + JOSE ~= OpenID Connect

Once again: OAuth2 is a framework.
OAuth2 framework is used in conjunction with JWT specs, JOSE and other ideas we are not going to detail here, the create OpenID Connect specification.

The idea you should bring back is that, more often you are probably interested into using and leveraging OpenID Connect, since it puts together the best of the approaches and idea defined here.
You are, yes, leveraging OAuth2, but you are now the much more defined bounds of OpenID Connect, that gives you richer tokens and support for Authentication, that was never covered by plain OAuth2.

Some of the online services offer you to chose between OAuth2 or OpenID Connect. Why is that?
Well, when they mention OpenID Connect, you know that you are using a standard. Something that will behave the same way, even if you switch implementation.
The OAuth2 option you are given, is probably something very similar, potentially with some killer feature that you might be interested into, but custom built on top of the more generic OAuth2 framework.
So be cautious with your choice.

Conclusion

If you are interested into this topic, or if this article has only confused you more, I suggest you to check OAuth 2 in Action by Justin Richer and Antonio Sanso.
On the other side, if you want to check your fresh knowledge and you want to try to apply it to an open source Authorisation Server, I will definitely recommend playing with Keycloak that is capable of everything that we have described here and much more!

Tuesday, May 24, 2016

JBoss Fuse: dynamic Blueprint files with JEXL

In this post I’ll show how to add a little bit of inline scripting in your Apache Aries Blueprint xml files.

I wouldn’t call it necessarely a best practice, but I have always had the idea that this capability might be usueful; probably I started wanting this when I was forced to use xml to simulate imperative programming structures like when using Apache Ant.

And I have found the idea validated in projects like Gradle or Vagrant where a full programming language is actually hiding in disguise, pretending to be a Domain Specific Languge or a surprisingly flexible configuration syntax.

I have talked in past about something similar, when showing how to use MVEL in JBoss Fuse.
This time I will limit myself to show how to use small snippets of code that can be inlined in your otherwise static xml files, trick that might turn useful in case you need to perform simple operations like replacement of strings, aritmetics or anything else but you want to avoid writing a java class for that.

Let me say that I’m not inventing anything new around here. I’m just showing how to use a functionality that has been provided directly by the Apache Aries project but that I haven’t used that often out there.

The goal is to allow you to write snippet like this:

...
  
       

...

You can see that we are invoking java.lang.String.replaceAll() method on the value of an environment variable.

We can do this thanks to the Apache Aries Bluerpint JEXL Evaluator, an extension to Apache Aries Blueprint, that implements a custom token processor that “extends” the base functionality of Aries Blueprint.

In this specific case, it does it, delegating the token interpolation to the project Apache JEXL.

JEXL, Java Expression Language, it’s just a library that exposes scripting capabilities to the java platorm. It’s not unique in what it does, since you could achieve the same with the native support for Javascript or with Groovy for instance. But we are going to use it since the integration with Blueprint has alredy been written, so we can use it straight away on our Apache Karaf or JBoss Fuse instance.

The following instructions have been verified on JBoss Fuse 6.2.1:

# install JEXL bundle
install -s mvn:org.apache.commons/commons-jexl/2.1.1 
# install JEXL Blueprint integration:
install -s mvn:org.apache.aries.blueprint/org.apache.aries.blueprint.jexl.evaluator/1.0.0

That was all the preparation that we needed, now we just need to use the correct XSD version, 1.2.0 in our Bluerpint file:

xmlns:ext="http://aries.apache.org/blueprint/xmlns/blueprint-ext/v1.2.0"

Done that, we can leverage the functionality in this way:



    

    
         
    
    
    
      
              
      
    

Copy that blueprint.xml directly into deploy/ folder, and you can check from Karaf shell that the dynamic invocation of those inline script has actually happened!

JBossFuse:karaf@root> ls (id blueprint.xml) | grep osgi.jndi.service.name
osgi.jndi.service.name = /OPT/RH/JBOSS-FUSE-6.2.1.REDHAT-107___3

This might turn useful in specific scenarios, when you look for a quick way to create dynamic configuration.

In case you might be interested into implementing your custom evaluator, this is the interface you need to provide an implementation of:

https://github.com/apache/aries/blob/trunk/blueprint/blueprint-core/src/main/java/org/apache/aries/blueprint/ext/evaluator/PropertyEvaluator.java

And this is an example of the service you need to expose to be able to refer it in your <property-placeholder> node:


    
        
    
    
    

Tuesday, April 26, 2016

Deploy and configure a local Docker caching proxy

Recently I was looking into caching for Docker layers downloading for the Fabric8 development environment, to allow me to trash the vms where my Docker daemon was running and still avoiding me to re-download basic images each single time I recreate my vm.

As usual, I tried to hit Google first, and was pointed to these couple of pages:

http://unpoucode.blogspot.it/2014/11/use-proxy-for-speeding-up-docker-images.html

and

https://blog.docker.com/2015/10/registry-proxy-cache-docker-open-source/

Since I use quite often Jerome Petazzoni’s approach for a transparent Squid + iptables to cache and sniff simple http traffic (usually http invocation from java programs), I’ve found that the first solution based made sense, so I have tried that first.

It turned out that the second link was what I was looking for; but I have still spent some good learning hours with the no longer working suggestions from the first one, learning the hard way that Squid doesn’t play that nice with Amazon’s Cloudfront CDN, used by Docker Hub. But I have to admit that it’s been fun.
Now I know how to forward calls to Squid to hit an intermediate interceptors that mangles with query params, headers and everything else.
I couldn’t find a working combination for Cloudfront but I am now probably able to reproduce the infamous Cats Proxy Prank. =)

Anyhow, as I was saying, what I was really looking for is that second link that shows you how to setup an intermediate Docker proxy, that your Docker daemon will try to hit, before defaulting to the usual Docker Hub public servers.

Almost everything that I needed was in that page, but I have found the information to be a little more cryptic that needed.

The main reason for that is because that example assumes I need security (TLS), which is not really my case since the proxy is completely local.

Additionally, it shows how to configure you Docker Registry using YAML configuration. Again, not really complex, but indeed more than needed.

Yes, because what you really need to bring up the simplest local (not secured) Docker proxy is this oneliner:

docker run -p 5000:5000 -d --restart=always --name registry   \
  -e REGISTRY_PROXY_REMOTEURL=http://registry-1.docker.io \
  registry:2

The interesting part here is that registry image, supports a smart alternative way to forward configuration to it, that saves you from passing it a YAML confguration file.

The idea, described here, is that if you follow a naming convention for the environment variables, that reflects the hierarchy of the YAML tree, you can turn something like:

...
proxy:
  remoteurl: http://registry-1.docker.io
...

That you should write in a .yaml file and pass to the process in this way:

docker run -d -p 5000:5000 --restart=always --name registry \
  -v `pwd`/config.yml:/etc/docker/registry/config.yml \
  registry:2

Into the much more conventient -e REGISTRY_PROXY_REMOTEURL=http://registry-1.docker.io runtime environment variable!

Let’s improve the example a little, so that we also pass our Docker proxy a non-volatile storage location for the cached layers, so that we are not losing them between invocations:

docker run -p 5000:5000 -d --restart=always --name registry   \
  -e REGISTRY_PROXY_REMOTEURL=http://registry-1.docker.io \
  -v /opt/shared/docker_registry_cache:/var/lib/registry \
  registry:2

Now we have everything we needed to save a good share of bandwidth, each time we need to get some Docker image that had already passed through our local proxy.

The only remaining bit is to tell our Docker daemon to be aware of the new proxy:

# update your docker daemon config, according to your distro
# content of my `/etc/sysconfig/docker` in Fedora 23
OPTIONS=" --registry-mirror=http://localhost:5000"

Reload (or restart) your Docker daemon and you are done! Just be aware that if you restart the daemon you might need also to re-start the Registry container, if you ware working on a single node.

An interesting discovery, it’s been learning that Docker daemon doesn’t break if it cannot find the specified registry-mirror. So you can add the configuration and forget about it, knowing that your interaction with Docker Hub will just benefit of possible hits your caching proxy, assuming it’s running.

You can see it working with the following tests:

docker logs -f registry

will log all the outgoing download requests, and once the set of requrests that compose a single pull image operation will be completed, you will also be able to check that the image is now completely served by your proxy with this invocation:

curl http://localhost:5000/v2/_catalog
# sample output
{"repositories":["drifting/true"]}

The article would be finished, but since I feel bad to show how to disable security on the internet, here’s also a very short and fully working and tested example of how to implement the same with TLS enabled:

# generate a self signed certificate; accept default for every value a part from Common Name where you have to put your box hostname
mkdir -p certs && openssl req  -newkey rsa:4096 -nodes -sha256 -keyout certs/domain.key  -x509 -days 365 -out certs/domain.crt

# copy to the locally trusted ones, steps for Fedora/Centos/RHEL
sudo cp certs/domain.crt /etc/pki/ca-trust/source/anchors/

# load the newly added certificate
sudo update-ca-trust enable

# run the registy using those keys that you have generated, mounting the files inside the container
docker run -p 5000:5000 --restart=always --name registry \
  -v `pwd`/certs:/certs \
  -e REGISTRY_HTTP_TLS_CERTIFICATE=/certs/domain.crt \
  -e REGISTRY_HTTP_TLS_KEY=/certs/domain.key \
  -e REGISTRY_PROXY_REMOTEURL=https://registry-1.docker.io \
  registry:2

# now you just need to remember that you are working in https, so you need to use that protocol in your docker daemon configuration, instead of plain http; also use that when you interact with the API in curl

Monday, October 19, 2015

Debugging tip: How to simulate a slow hardisk

As a Software Engineer there are times when you’d like to have a slower system.

It doesn’t happen really often actually: usually it’s when someone reports a bug on your software that you have never seen before and that you can not reproduce.

The majority of time, the reason of those ghost bugs are race conditions.

Race conditions, are issues you might face with multithreaded programming. Imagine that your software does multiple things at the same time. Despite most of the time those things happen in the expected and intuitive order, sometimes, they don’t; leading to unexpected state of your program.

They are indeed a bug. It’s dev’s fault. But the dev doesn’t have much way to protect himself from them. There are programming styles and technologies that push you to avoid the risk altogether, but I think that in general, they are a condition that each developer has to be familiar with.

So why a slower system helps?

Because most of the times, race conditions are masked by the fact the operations still happen “reasonably quickly”. This ambiguos “reasonably quickly” is the main issue. There is no clear limit or number that tells you how quickly. You just have higher chances to see them if things are slow enough to show they are not happening in the correct order or they are not waiting for the correct checkpoints.

In my experience with java applications, the main performance related aspect, while reproducing race condition is disk access speed. More thatn cpu speed or the amount of RAM, I have noticed that disk speed is the biggest differentiation between similar systems.

In this post I will show how to simulate a slow hardisk on Linux to increase your chances to reproduce race conditions.

The solution will be based on nbd and trickle and it will use the network layer to regulate the i/o throughput for your virtual hardisk.

I’d like to start adding that this isn’t anything new and that I’m not suggesting any particularly revolutionary approach. There are many blogpost out there that describe how to achieve this. But for multiple reasons, none of those that I have read worked out of the box on my Fedora 22 or Centos 6 installations.
That is the main reason that pushed me to give back to the internet, adding what might be, just another page on the argument.

Let’s start with the idea of using nbd or Network Block Device to simulate our hardisk.

As far as I understand there aren’t official ways, exposed by the linux Kernel to regulate the I/O speeds of generic block devices.

Over the internet you may find many suggestions, spanning from disabling read and write cache to geneate real load that could make your system busy.

QoS can be enforced on the network layer though. So the idea is to emulate a block device via network.

Despite this might sound very complicated (and maybe it is), the problem has been already been solved by the Linux Kernel with the nbd module.

Since on my Fedora 22 that module is not enabled automatically by default, we have to install it first, and then enable it:

# install nbd module
sudo yum install nbd

# load nbd module
sudo modprobe nbd

# check nbd module is really loaded
lsmod | grep nbd
nbd                    20480  0

Now that nbd is installed and the module loaded we create a configuration file for its daemon:

# run this command as root
"cat > /etc/nbd-server/config" <<EOF
[generic]
[test]
    exportname = /home/pantinor/test_nbd
    copyonwrite = false
EOF

Where exportname is a path to a file that will represent your slow virtual hardisk.

You can create the file with this command:

# create an empty file, and reserve it 1GB of space
dd if=/dev/zero of=/home/pantinor/test_nbd bs=1G count=1

Now that config and the destination files are in place, you can start the nbd-server using daemon:

# start ndb-server daemon
sudo systemctl start nbd-server.service

# monitor the daaemon start up with:
journalctl -f --unit nbd-server.service

At this point you have a server network process, listening on port 10809 that any client over your network can connect to , to mount it as a network block device.

We can mount it with this this command:

# "test" corresponds to the configuration section in daemon  config file
sudo nbd-client -N test  127.0.0.1 10809  /dev/nbd0
# my Centos 6 version of nbd-client needs a slightly different synatx:
#    sudo nbd-client -N test  127.0.0.1   /dev/nbd0

Now we have created a virtual block device, called /dev/nbd0. Now we can format it like it was a normal one:

# format device
sudo mkfs /dev/nbd0 

# create folder for mounting
sudo mkdir /mnt/nbd

# mount device, sync option is important to not allow the kernel to cheat!
sudo mount -o sync /dev/nbd0 /mnt/nbd

# add write permissions to everyone
sudo chmod a+rwx /mnt/nbd

Not that we have passed to mount command the flag -o sync. This command has an important function: to disable an enhancement in the linux Kernel that delays the completion of write operations to the devices. Without that all the write operations will look like instantaneous, and the kernel will actually complete the write requests in background. With this flag instead, all the operation will wait until the operation has really completed.

You can check that now you are able to read and write on the mount point /mnt/nbd.

Let’s now temporarily unmount and disconnect from nbd-server:

sudo umount /mnt/nbd

sudo nbd-client -d /dev/nbd0

And let’s introduce trickle.

Trickle is a software you can use to wrap other processes and to limit their networking bandwidth.

You can use it to limit any other program. A simple test you can perform with it is to use it with curl:

# download a sample file and limits download speed to 50 KB/s
trickle -d 50 -u 50  curl -O  http://download.thinkbroadband.com/5MB.zip

Now, as you can expect, we just need to join trickle and nbd-server behavior, to obtain the desired behavior.

Let’s start stopping current nbd-server daemon to free up its default port:

sudo systemctl stop nbd-server.service

And let’s start it via trickle:

# start nbd-server limiting its network throughput
trickle -d 20 -u 20 -v nbd-server -d

-d attaches the server process to the console, so the console will be blocked and it will be freed only one you close the process or when a client disconnects.
Ignore the error message: trickle: Could not reach trickled, working independently: No such file or directory

Now you can re-issue the commands to connect to nbd-server and re mount it:

sudo nbd-client -N test  127.0.0.1 10809  /dev/nbd0

sudo mount -o sync /dev/nbd0 /mnt/nbd

And you are done! Now you have a slow hardisk mounted on /dev/nbd0.

You can verify the slow behavior in this way:

sudo dd if=/dev/nbd0 of=/dev/null bs=65536 skip=100 count=10
10+0 records in
10+0 records out
655360 bytes (655 kB) copied, 18.8038 s, 34.9 kB/s

# when run against an nbd-server that doesn't use trickle the output is:
# 655360 bytes (655 kB) copied, 0.000723881 s, 905 MB/s

Now that you have a slow partition, you can just put the files of your sw there to simulate a slow i/o.

All the above steps can be converted to helper scripts that will make the process much simpler like those described here: http://philtortoise.blogspot.it/2013/09/simulating-slow-drive.html.

Monday, October 5, 2015

JBoss Fuse - Turn your static config into dynamic templates with MVEL

Recently I have rediscovered a JBoss Fuse functionality that I had forgotten about and I’ve thought that other people out there may benefit of this reminder.

This post will be focused on JBoss Fuse and Fabric8 but it might interest also all those developers that are looking for minimally invasive ways to add some degree of dynamic support to their static configuration files.

The idea of dynamic configuration in OSGi and in Fabric8

OSGi framework is more often remembered for its class-loading behavior. But a part of that, it also defines other concepts and functionality that the framework has to implement.
One of the is ConfigAdmin.

ConfigAdmin is a service to define an externalized set of properties files that are logically bounded to your deployment units.

The lifecycle of this external properties files is linked with OSGi bundle lifecycle: if you modify an external property file, your bundle will be notified.
Depending on how you coded your bundle you can decide to react to the notification and, programmatically or via different helper frameworks like Blueprint, you can invoke code that uses the new configuration.

This mechanism is handy and powerful, and all developers using OSGi are familiar with it.

Fabric8 builds on the idea of ConfigAdmin, and extends it.

With its provisioning capabilities, Fabric8 defines the concept of a Profile that encapsulates deployment units and configuration. It adds some layer of functionality on top of plain OSGi and it allows to manage any kind of deployment unit, not only OSGi bundles, as well as any kind of configuration or static file.

If you check the official documentation you will find the list of “extensions” that Fabric8 layer offers and you will learn that they are divided mainly in 2 groups: Url Handlers and Property Resolvers.

I suggest everyone that is interested in this technology to dig through the documentation; but to offer a brief summary and a short example, imagine that your Fabric profiles have the capability to resolve some values at runtime using specific placeholders.

ex.

# sample url handler usage, ResourceName is a filename relative to the namespace of the containing Profile:
profile:ResourceName

# sample property handler, the value is read at deploy time, from the Apache Zookeeper distributed registry that is published when you run JBoss Fuse
${zk:/fabric/registry/containers/config/ContainerName/Property}

There are multiple handlers available out of the box, covering what the developers thought were the most common use cases: Zookeeper, Profiles, Blueprint, Spring, System Properties, Managed Ports, etc.

And you might also think to extend the mechanism defining your own extension: for example you might want to react to performance metrics you are storing on some system, you can write an extension, with it’s syntax convention, that injects values taken from your system.

The limit of all this power: static configuration files

The capabilities I have introduced above are exciting and powerful but they have an implicit limit: they are available only to .properties files or to files that Fabric is aware of.

This means that those functionality are available if you have to manage Fabric Profiles, OSGi properties or other specific technology that interact with them like Camel, but they are not enabled for anything that is Fabric-Unaware.

Imagine you have your custom code that reads an .xml configuration file. And imagine that your code doesn’t reference any Fabric object or service.
Your code will process that .xml file as-is. There won’t be any magic replacement of tokens or paths, because despite you are running inside Fabric, you are NOT using any directly supported technology and you are NOT notifying Fabric, that you might want its services.

To solve this problem you have 3 options:

  1. You write an extension to Fabric to handle and recognise your static resources and delegates the dynamic replacement to the framework code.
  2. You alter the code contained in your deployment unit, and instead of consuming directly the static resources you ask to Fabric services to interpolate them for you
  3. *You use mvel: url handler (and avoid touching any other code!)

What is MVEL ?

MVEL is actually a programming language: https://en.wikipedia.org/wiki/MVEL .
In particular it’s also scripting language that you can run directly from source skipping the compilation step.
It actually has multiple specific characteristics that might make it interesting to be embedded within another application and be used to define new behaviors at runtime. For all these reasons, for example, it’s also one of the supported languages for JBoss Drools project, that works with Business Rules you might want to define or modify at runtime.

Why it can be useful to us? Mainly for 2 reasons:

  1. it works well as a templating language
  2. Fabric8 has already a mvel: url handler that implicitly, acts also as a resource handler!

Templating language

Templating languages are those family of languages (often Domain Specific Languages) where you can altern static portion of text that is read as-is and dynamic instructions that will be processed at parsing time. I’m probably saying in a more complicated way the same idea I have already introduced above: you can have tokens in your text that will be translated following a specific convention.

This sounds exactly like the capabilities provided by the handlers we have introduced above. With an important difference: while those were context specific handler, MVEL is a general purpose technology. So don’t expect it to know anything about Zookeeper or Fabric profiles, but expect it to be able to support generic programming language concepts like loops, code invocation, reflection and so on.

Fabric supports it!

A reference to the support in Fabric can be find here: http://fabric8.io/gitbook/urlHandlers.html

But let me add a snippet of the original code that implements the functionality, since this is the part where you might found this approach interesting even outside the context of JBoss Fuse:

https://github.com/fabric8io/fabric8/blob/1.x/fabric/fabric-core/src/main/java/io/fabric8/service/MvelUrlHandler.java#L115-L126

public InputStream getInputStream() throws IOException {
  assertValid();
  String path = url.getPath();
  URL url = new URL(path);
  CompiledTemplate compiledTemplate = TemplateCompiler.compileTemplate(url.openStream());
  Map<String, Object> data = new HashMap<String, Object>();
  Profile overlayProfile = fabricService.get().getCurrentContainer().getOverlayProfile();
  data.put(“profile”, Profiles.getEffectiveProfile(fabricService.get(), overlayProfile));
  data.put(“runtime”, runtimeProperties.get());
  String content = TemplateRuntime.execute(compiledTemplate, data).toString();
  return new ByteArrayInputStream(content.getBytes());
}

What’s happening here?

First, since it’s not showed in the snippet, remember that this is a url handler. This means that the behavior get’s triggered for files that are referred to via a specific uri. In this case it’s mvel:. For example a valid path might be mvel:jetty.xml.

The other interesting and relatively simple thing to notice is the interaction with MVEL interpreter.
Like most of the templating technologies, even the simplest ones you can implement yourself you usually have:

  • an engine/complier, here it’s TemplateCompiler
  • a variable that contains your template, here it’s url
  • a variable that represent your context, that is the set of variables you want to expose to the engine, here data

Put them all together, asking the engine to do it’s job, here with TemplateRuntime.execute(...) and what you get in output is a static String. No longer the templating instructions, but all the logic your template was defining has been applied, and eventually, augmented with some of the additional input values taken from the context.

An example

I hope my explanation it’s been simple enough, but probably an example is the best way to express the concept.

Let’s use jetty.xml, contained in JBoss Fuse default.profile, that is a static resource that JBoss Fuse doesn’t handle as any special file, so it doesn’t offer any replacement functionality to it.

I will show both aspects of MVEL integration here: reading some value from the context variables and show how programmatic logic (just the sum of 2 integers here) can be used:

<Property name="jetty.port" default="@{  Integer.valueOf( profile.configurations['org.ops4j.pax.web']['org.osgi.service.http.port'] ) + 10  }"/>

We are modifying the default value for Jetty port, taking its initial value from the “profile” context variable, that is a Fabric aware object that has access to the rest of the configuration:

profile.configurations['org.ops4j.pax.web']['org.osgi.service.http.port']

we explicitly cast it from String to Integer:

Integer.valueOf( ... )

and we add a static value of 10 to the returned value:

.. + 10

Let’s save the file, stop our fuse instance. Restart it and re-create a test Fabric:

# in Fuse CLI shell
shutdown -f

# in bash shell
rm -rf data instances

bin/fuse

# in Fuse CLI shell
fabric:create --wait-for-provisioning

Just wait and monitor logs and… Uh-oh. An error! What’s happening?

This is the error:

2015-10-05 12:00:10,005 | ERROR | pool-7-thread-1  | Activator                        | 102 - org.ops4j.pax.web.pax-web-runtime - 3.2.5 | Unable to start pax web server: Exception while starting Jetty
java.lang.RuntimeException: Exception while starting Jetty
at org.ops4j.pax.web.service.jetty.internal.JettyServerImpl.start(JettyServerImpl.java:143)[103:org.ops4j.pax.web.pax-web-jetty:3.2.5]
…
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)[:1.7.0_76]
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)[:1.7.0_76]
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)[:1.7.0_76]
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)[:1.7.0_76]
at org.eclipse.jetty.xml.XmlConfiguration$JettyXmlConfiguration.set(XmlConfiguration.java:572)[96:org.eclipse.jetty.aggregate.jetty-all-server:8.1.17.v20150415]
at org.eclipse.jetty.xml.XmlConfiguration$JettyXmlConfiguration.configure(XmlConfiguration.java:396)[96:org.eclipse.jetty.aggregate.jetty-all-server:8.1.17.v20150415]
…
Caused by: java.lang.NumberFormatException: For input string: “@{profile.configurations[’org.ops4j.pax.web'][‘org.osgi.service.http.port’] + 1}”
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)[:1.7.0_76]
at java.lang.Integer.parseInt(Integer.java:492)[:1.7.0_76]
at java.lang.Integer.<init>(Integer.java:677)[:1.7.0_76]
… 29 more

If you notice, the error message says that our template snippet cannot be converted to a Number.

Why our template snippet is displayed in first instance? The templating engine should have done its part of the job and give us back a static String without any reference to templating directives!

I have showed you this error on purpose, to insist on a concept I have described above but that might get uncaught on first instance.

MVEL support in Fabric, is implemented as an url handler.

So far we have just modified the content of a static resource file, but we haven’t given any hint to Fabric that we’d like to handle that file as a mvel template.

How to do that?

It’s just a matter of using the correct uri to refer to that same file.

So, modify the file default.profile/org.ops4j.pax.web.properties that is the place in the default Fabric Profile where you define which static file contains Jetty configuration:

# change it from org.ops4j.pax.web.config.url=profile:jetty.xml to
org.ops4j.pax.web.config.url=mvel:profile:jetty.xml

Now, stop the instance again, remove the Fabric configuration files, recreate a Fabric and notice how you Jetty instance is running correctly.

We can check it in this way:

JBossFuse:karaf@root> config:list | grep org.osgi.service.http.port
   org.osgi.service.http.port = 8181

While from your browser you can verify, that Hawtio, JBoss Fuse web console that is deployed on top Jetty, is accessible to port 8191: http://localhost:8191/hawtio

Tuesday, February 17, 2015

JBoss Fuse - Some less known trick

TL;DR

  1. expose java static calls as Karaf shell native commands
  2. override OSGi Headers at deploy time
  3. override OSGi Headers after deploy time with OSGi Fragments

Expose java static calls as Karaf shell native commands

As part of my job as software engineer that has to collaborate with support guys and customers, I very often find myself in the need of extracting additional information from a system I don't have access to.
Usual approaches, valid in all kind of softwares, are usually extracting logs, invoking interactive commands to obtain specific outputs or in what is the most complex case deploy some PoC unit that is supposed to verify a specific behavior.

JBoss Fuse, adn Karaf, the platform it's based onto do alredy a great job in exposing all those data.

You have:

  • extensive logs and integration with Log4j
  • extensive list of jmx operation (you can eventually invoke over http with jolokia)
  • a large list of shell commands

But sometimes this is not enough. If you have seen my previous post about how to use Byteman on JBoss Fuse, you can imagine all the other cases:

  1. you need to print values that are not logged or returned in the code
  2. you might need to short-circuit some logic to hit a specific execution branch of your code
  3. you want to inject a line of code that wasn't there at all

Byteman is still a very good option to, but Karaf has a facility we can use to run custom code.

Karaf, allows you to write code directly in its shell; and allows you to record these bits of code as macro you can re-invoke. This macro will look like a native Karaf shell command!

Let's see a real example I had to implement:

verify if the jvm running my JBoss Fuse instance was resolving a specific DNS as expected.

The standard JDK has a method you can invoke to resolve a dns name:
InetAddress.gettAllByName(String)

Since that command is simple enough, meaning it doesn't requires a complex or structured input, I thought I could turn it into an easy to reuse command:

# add all public static methods on a java class as commands  to the namespace "my_context": 
# bundle 0 is because system libs are served by that bundle classloader
addcommand my_context (($.context bundle 0) loadClass java.net.InetAddress) 

That funky line is explained in this way:

  • addcommand is the karaf shell functionality that accepts new commands
  • my_context is the namespace/prefix you will attach you command to. In my case, "dns" would have made a good namespace. ($.context bundle 0) invokes java code. In particular we are invoking the $.context instances, that is a built-in instance exposed by Karaf shell to expose the OSGi framework, whose type is org.apache.felix.framework.BundleContextImpl, and we are invoking its method called bundle passing it the argument 0 representing the id of the OSGi classloader responsible to load the JDK classes. That call returns an instance of org.apache.felix.framework.Felix that we can use to load the specific class definition we need, that is java.net.InetAddress.

As the inline comment says, an invocation of addcommand, exposes all the public static method on that class. So we are now allowed to invoke those methods, and in particular, the one that can resolve dns entries:

JBossFuse:karaf@root> my_context:getAllByName "www.google.com"
www.google.com/74.125.232.146
www.google.com/74.125.232.145
www.google.com/74.125.232.148
www.google.com/74.125.232.144
www.google.com/74.125.232.147
www.google.com/2a00:1450:4002:804:0:0:0:1014

This functionality is described on Karaf documentation page.

Override OSGi Headers at deploy time

If you work with Karaf, you are working with OSGi, love it or hate it.
A typical step in each OSGi workflow is playing (or fighting) with OSGi headers.
If you are in total control of you project, this might be more or less easy, depending on the releationship between your deployment units. See Christian Posta post to have a glimpse of some less than obvious example.

Within those conditions, a very typical situation is the one when you have to use a bundle, yours or someone else's, and that bundle headers are not correct.
What you end up doing, very often is to re-package that bundles, so that you can alter the content of its MANIFEST, to add the OSGi headers that you need.

Karaf has a facility in this regard, called the wrap protocol.
You might alredy know it as a shortcut way to deploy a non-bundle jar on Karaf but it's actually more than just that.
What it really does, as the name suggest, is to wrap. But it can wrap both non-bundles and bundles!
Meaning that we can also use it to alter the metadata of an already packaged bundle we are about to install.

Let's give an example, again taken fron a real life experience.
Apache HttpClient is not totally OSGi friendly. We can install it on Karaf with the wrap: protocol and export all its packages.

JBossFuse:karaf@root> install -s 'mvn:org.apache.httpcomponents/httpclient/4.2.5'
Bundle ID: 257
JBossFuse:karaf@root> exports | grep -i 257
   257 No active exported packages. This command only works on started bundles, use osgi:headers instead
JBossFuse:karaf@root> install -s 'wrap:mvn:org.apache.httpcomponents/httpclient/\ 4.2.5$Export-Package=*; version=4.2.5'
Bundle ID: 259
JBossFuse:karaf@root> exports | grep -i 259
   259 org.apache.http.client.entity; version=4.2.5
   259 org.apache.http.conn.scheme; version=4.2.5
   259 org.apache.http.conn.params; version=4.2.5
   259 org.apache.http.cookie.params; version=4.2.5
...

And we can see that it works with plain bundles too:

JBossFuse:karaf@root> la -l | grep -i camel-core
[ 142] [Active     ] [            ] [       ] [   50] mvn:org.apache.camel/camel-core/2.12.0.redhat-610379
JBossFuse:karaf@root> install -s 'wrap:mvn:org.apache.camel/camel-core/2.12.0.redhat-610379\
$overwrite=merge&Bundle-SymbolicName=paolo-s-hack&Export-Package=*; version=1.0.1'
Bundle ID: 269

JBossFuse:karaf@root> headers 269

camel-core (269)
----------------
...

Bundle-Vendor = Red Hat, Inc.
Bundle-Activator = org.apache.camel.impl.osgi.Activator
Bundle-Name = camel-core
Bundle-DocURL = http://redhat.com
Bundle-Description = The Core Camel Java DSL based router

Bundle-SymbolicName = paolo-s-hack

Bundle-Version = 2.12.0.redhat-610379
Bundle-License = http://www.apache.org/licenses/LICENSE-2.0.txt
Bundle-ManifestVersion = 2

...

Export-Package = 
    org.apache.camel.fabric;
        uses:="org.apache.camel.util,
            org.apache.camel.model,
            org.apache.camel,
            org.apache.camel.processor,
            org.apache.camel.api.management,
            org.apache.camel.support,
            org.apache.camel.spi";
        version=1.0.1,

...

Where you can see Bundle-SymbolicName and the version of the exported packages are carrying the values I set.

Again, the functionality is described on Karaf docs and you might find useful the wrap protocol reference.

Override OSGi Headers after deploy time with OSGi Fragments

Last trick is powerful, but it probably requires you to remove the original bundle if you don't want to risk having half of the classes exposed by one classloader and the remaining ones (those packages you might have added in the overridden Export) in another one.

There is actually a better way to override OSGi headers, and it comes directly from an OSGi standard functionality: OSGi Fragments.

If you are not familiare with the concept, the definition taken directly from OSGi wiki is:

A Bundle fragment, or simply a fragment, is a bundle whose contents are made available to another bundle (the fragment host). Importantly, fragments share the classloader of their parent bundle.

That page gives also a further hint about what I will describe:

Sometimes, fragments are used to 'patch' existing bundles.

We can use this strategy to:

  • inject .jars in the classpath of our target bundle
  • alter headers of our target bundle

I have used the first case to fix a badly configured bundle that was looking for a an xml configuration descriptor that it didn't include, and that I have provided deploying a light Fragment Bundle that contained just that.

But the use case I want to show you here instead, is an improvement regarding the way to deploy Byteman on JBoss Fuse/Karaf.

If you remember my previous post, since Byteman classes needed to be available from every other deployed bundle and potentially need access to every class available, we had to add Byteman packages to the org.osgi.framework.bootdelegation property, that instructs the OSGi Framework to expose the listed packages through the virtual system bundle (id = 0).

You can verify what is currently serving with headers 0, I won't include the output here since it's a long list of jdk extension and framework classes.

If you add your packages, org.jboss.byteman.rule,org.jboss.byteman.rule.exception in my case, even these packages will be listed in the output of that command.

The problem with this solution is that this is a boot time property. If you want to use Byteman to manipulate the bytecode of an already running instance, you have to restart it after you have edited this properties.

OSGi Fragments can help here, and avoid a preconfiguration at boot time.

We can build a custom empty bundle, with no real content, that attaches to the system bundle and extends the list of packages it serves.

<Export-Package>
    org.jboss.byteman.rule,org.jboss.byteman.rule.exception
</Export-Package>
<Fragment-Host>
    system.bundle; extension:=framework
</Fragment-Host>

That's an excerpt of maven-bundle-plugin plugin configuration, see here for the full working Maven project, despite the project it's really just 30 lines of pom.xml:

JBossFuse:karaf@root> install -s mvn:test/byteman-fragment/1.0-SNAPSHOT

Once you have that configuration, you are ready to use Byteman, to, for example, inject a line in java.lang.String default constructor.

# find your Fuse process id
PROCESS_ID=$(ps aux | grep karaf | grep -v grep | cut -d ' ' -f2)

# navigate to the folder where you have extracted Byteman
cd /data/software/redhat/utils/byteman/byteman-download-2.2.0.1/

# export Byteman env variable:
export BYTEMAN_HOME=$(pwd)
cd bin/

# attach Byteman to Fabric8 process, no output expected unless you enable those verbose flags
sh bminstall.sh -b -Dorg.jboss.byteman.transform.all $PROCESS_ID 
# add these flags if you have any kind of problem and what to see what's going on: -Dorg.jboss.byteman.debug -Dorg.jboss.byteman.verbose

# install our Byteman custom rule, we are passing it directly inline with some bash trick
sh bmsubmit.sh /dev/stdin <<OPTS

# smoke test rule that uses also a custom output file
RULE DNS StringSmokeTest
CLASS java.lang.String
METHOD <init>()
AT ENTRY
IF TRUE
DO traceln(" works: " );
traceOpen("PAOLO", "/tmp/byteman.txt");
traceln("PAOLO", " works in files too " );
traceClose("PAOLO");
ENDRULE

OPTS

Now, to verify that Byteman is working, we can just invoke java.lang.String constructor in Karaf shell:

JBossFuse:karaf@root> new java.lang.String
 works: 

And as per our rule, you will also see the content in /tmp/byteman.txt

Inspiration for this third trick come from both the OSGi wiki and this interesting page from Spring guys.

If you have any comment or any other interesting workflow please leave a comment.