Tuesday, September 26, 2017

Ansible - A handy tool for people that might not need it

tl;dr

Ansible is a good tool. You can do what it gives you in many different ways, and you might be already doing it. With this assumption, I’ll try to show you why you might like it.

Ansible and I

I have recently passed Red Hat Certificate of Expertise in Ansible Automation.

I have taken the exam because I felt that I didn’t know much about Ansible and since the topic was hot with new modules and integration popping out on a daily basis.
Each time I can find a way to match a new learning opportunity with the possibility to verify my understanding of a topic against a Red Hat formal exam, I take a chance. That’s how I got that nice t-shirt when I got my RHCA =P

What’s Ansible?

Ansible is a configuration management and deployment devops tool.

Let’s break this apart, starting from the end: I haven’t thrown DevOps here just as a catchy buzzword, but I cited it just to give you a context. We are in the DevOps world here. Somewhere in between system administration, operations and developers’ deliverables.
And we are dealing with configuration management, meaning that we probably have quite a large matrix of specific configuration that has to match with an equivalently complex matrix of the environment to apply it to.
And it seems that we are allowed to use the tool also to PUT deliverables and configuration on the target machines we are interested to manage.

So far so good.

Just to check if you are still with me, I’ll challenge you with a provocative question:

Tell me the name of a technology, out there since forever, that you have used to perform the same task in the past.

I’m not pretending to guess everyone’s answer, but my hope is that a large portion of readers would reply with any variation of this: scripts or bash.

If you also replied that, we are definitely on the same page and I can assure that I’ll try to keep this comparison in mind in this blog post, hoping that this can be a useful way to tackle this topic.

Alternatives

If you work in IT, at any level probably, you must have some familiarity with any of the alternative tools to perform a similar job.
The usual suspects are Chef, Puppet, Salt and bash/scripts for Configuration Management and Capistrano, Fabric and bash/scripts for Deployment.
Again, I’m stressing out the custom scripts part here. Because, besides the fact that anything that brings with it the custom adjective in the name, is a possible code smell in an industry that aims to improve via standardization, it’s also implicitly suggesting that if your software management model is mature enough to give you everything that you need, you are probably already where you want to be.

A specific characteristic that distinguishes Ansible from most of its alternatives is the fact that it has a agentless architecture.
This means that you won’t have a daemon or equivalent process, always running on the managed system, listening for a command from a central server to perform any operation.

But… you will have to rely on a daemon, always running, to allow your managed node to perform the operations you want.
Is this a fraud?
No, but I like to draw out the pros and cons of things, without cheating and hiding behind too good to be true marketing claims.
For Ansible, to be able to work in its agentless way, you have to rely on an SSHD (or equivalent, when you are outside *NIX*) daemon to allow remote connections.
So it’s not a fraud. SSHD always running process is often found in many systems, even when you don’t use Ansible to manage them. But at the same time you cannot say that if you don’t have an agent running on a managed node, you don’t have anything else running in its place!

Is this all? Ansible press says yes, I personally say no.

Uh? The story out there uses to cite the refrain that SSH it’s the only thing you need to run Ansible. I feel that this is not completely correct, and if you take it as an absolute assertion it’s just plain wrong:

  1. Besides sshd, Ansible needs python installed on the managed host. This might be overlooked if you manage just nodes based on modern Linux distros. But in recent times, when you might find yourself managing lower level devices like IoT ones, that try to reduce the number of packages installed, you might end up without python. In those cases, Ansible wouldn’t work.

  2. In the specific case of local deployment, you don’t even need SSHD! I got this is a specific use case, but I think it’s an important one. I might want to use Ansible as a full replacement on a large bash script for example. Bash doesn’t assume ssh. It’s mainly a collection of command invocation tied together with execution logic. Ansible can be that for you. And if that’s the case, you don’t even need sshd running locally, since it’s implicit support for localhost allows you to run Ansible script anyway.

Why python?

This is an important question to understand how Ansible operates:

Ansible allows you to express scripts, that it calls playbooks, written with a custom YAML based syntax/dialect.

Why python then?
Because the playbook written in YAML is not the artifact that it’s going to be run. It’s just used as an intermediate artifact that is preprocessed by Ansible toolkit, to produce a fully valid python program, performing the instructions expressed in the higher level YAML syntax.
This python program is then copied to the managed node, and run. You now get how come python was a strict requirement for the managed nodes.

The DSL

Ansible uses a YAML based DSL for its script.
This is one of those points where I have mixed feelings: in general, I’m pro Domain Specific Languages. The expectation is that they are going to offer facilities at language level to perform the most common tasks in the Domain in an easier way. The main reason for my mixed feelings is that I have quite a bad memory for language syntaxes. I can make things working with a sample to look at or the internet, but I’m never sure what’s the language-specific convention for functions, parenthesis and such.

In the case of Ansible, and considering that the main alternative is bash that is not really intuitive for anything that relates to formatting, semicolons, control structures and string manipulation, I can honestly say that the DSL idea works well.

This is the skeleton of an Ansible playbook:

---
- name: My Playbook
  hosts: host1,host2
  tasks:
  - name: say hello
    debug:
      msg: "Hello world"
  - name: copy file
    copy:
      src: /path/to/source
      dest: /path/to/dest
...

And I swear I have typed it without copying it.
As you can see, is reasonably intuitive.
As you can imagine someone needs to tell you the list of allowed keywords that you can use. And you have to know what’s mandatory or optional, but you can probably read it and start setting your expectations.

If you are like me (or Alan Kay), you are probably mumbling already that yeah, Ansible looks simple to perform simple tasks, but I haven’t proved how to fit it is for advanced stuff. Just be patient for a little longer, since I’ll come back on this later.

The structure of the sample script above covers a large part of what you need to know about Ansible. There are 2 key elements:

  • hosts
  • tasks

Not surprisingly you have a list of hosts. You might have written a similar bash script with hostnames externalized in their own variable, but these are indeed more complex than it looks. They are actually not a plain list of addresses, but they are keys in the dictionary that Ansible uses to keep track of managed hosts. You may wonder why all this complexity?

There are 2 main reasons:

  • The first one is that this way, having a complex object/dictionary tied to each managed host, you can also attach a lot of other data and metadata to the entry. You can have a place to specify users, connection parameters and such; up to custom variables, you want to assign a special value just to the specific node.
  • The second one instead, is less straightforward but actually harder to implement if things weren’t broken down this way: this decoupled mechanism, allows you to plug in easily dynamic data for the collection of hosts, that Ansible calls inventory.
    In ansible, an inventory file can be a static file with the lists of hosts and their properties, or it can be any binary file, that when invoked with special parameters defined by a service contract, returns a JSON object containing the same information you can have with a static file. If this seems overkill to you, just try to think about how things are less static nowadays with virtualization and cloud environments.

Now the tasks part.
This is the easy part because something else encapsulates the harder aspects.
tasks is just a list of individual operation to perform.
As simple as that.
The operations are provided to you by a large set of standard library modules. In our example, we use 2 very simple ones, debug used just to print out strings, and copy that as you can guess, just copies a file.
This is probably the aspect where Ansible shines the most: it has a massive set of off the shelf components that you can start using just passing them the required configuration.
The list here is really long. You can check it yourself with ansible-doc a companion CLI tool, that is your best friend, even more than Google probably when you are working on a script:
ansible-doc -l gives you the full list of modules currently supported by Ansible, from commands to configure enterprise load balancers to other to configure and deploy AWS Lambdas. Literally everything for everyone!

ansible-doc is super useful because the style of the documentation it allows you to browse has been kept the light on purpose, with a large focus on the examples.

For example, if you inspect the documentation of the get_url module, and jump to the EXAMPLES section, you’ll probably find the snippet to start customizing the command for your specific needs:
ex.

ansible-doc copy
...
EXAMPLES:
- name: download foo.conf
  get_url:
    url: http://example.com/path/file.conf
    dest: /etc/foo.conf
    mode: 0440
...

A large standard library + a light inline documentation is most likely something your custom bash scripts cannot offer you.

Up until this point, if you are skeptics like me, you would probably have told me that you appreciate it but that other than a particularly large module library and excellent documentation you don’t see any compelling reason to move my consolidated practices to a new tool.

And I agree.

So I’ll weight in some further feature.

ansible-playbook, that is the CLI command you pass your scripts to be invoked, has some built-in facility, that once again, is really handy to the person that is writing scripts and has to debug it.

For example, you can pass it a flag where the Ansible prompt will ask you if you really want to run a specific task or if you want to skip it. (--step).
Otherwise, you might want to use a flag to provide the name of the specific task on your list to start the execution from: --start-at-task=.
Or you might want to run only tasks tagged with a specific marker: --tags=.
You even have a flag to run the script in test mode, so that no real operation is really invoked but, when possible and it makes sense, get back an output that would tell you what effect your command invocation would have generated on the remote system!

These (and many others I don’t have time to list here) are definitely nice facilities to anyone that has to write a script, in any language.

Another important feature/facility that it’s important to underline is the fact that Ansible modules operations are oriented to be idempotent.
This means that they are written in such a way that if you perform the same exact operation more than once, it will leave the system in the same state as after the first invocation.

This is probably better explained with an example:
Imagine that the operation you have to perform is to put a specific line in a file.
A naive way to obtain this in bash is with concatenation. If I’d just use the >> operator, I’m sure I will always add my entry to that file.
But what happens if the entry was already there? If concatenate a second time, I would add a second entry. And this is unlikely what I need to do most of the time.
To correctly solve my issue, I should append the entry only if it wasn’t there before.
It’s still not super complex, but I definitely have to keep more things in mind.

Ansible modules try to encapsulate this “idempotency” responsibility: you just declare the line of text you want in a file. It’s up to the module verify that it won’t be added a second time if already present.

Another way to explain this approach is that Ansible syntax is declarative. You just need to define the final state of the system you are interacting with, not really the complete logic performed to get to that state.

This nifty feature allows you to run the same script against the same node more than once, without the risk of ruining the system configuration. Which allows you to avoid disrupting a healthy system if a script is run for twice by error, but also to speed up things a lot in many cases: imagine a complex script that downloads files, extract them, install and so on… If a system is already in the final state declared by your script, you can avoid performing redundant steps!

Idempotency is also important to understand ansible-playbook output.

At the end of an invocation of a script, you’ll get a recap similar to this:

PLAY RECAP ****************************************************************************
host1                  : ok=1    changed=1    unreachable=0    failed=0   

This summary recaps what happened in the scripts. You also have the full transcript in the above lines I’m not showing here, but I want to focus on the recap.

The recap is strictly tied to the idempotency idea: the number of tasks in the ok column is those that HAVE NOT altered the system since it was already in the desired state!
Those that have instead performed some work are listed in the changed column.

Some built-in construct like handlers and also the logic you might want to use in your scripts are built around the concept of changed: do other stuff only if you a specific task has changed the system, otherwise probably don’t need to (since the system hasn’t changed).

Advanced stuff

Declaimer: this is not really advanced stuff. But it’s that kind of stuff, that in bash you often rely on documentation or examples to verify how it’s performed.

Let’s start with the setup module:
Ansible default behavior, unless you ask to turn it off, is to run, as the very first module that you don’t have to define explicitly, a discovery command called setup. This command gathers a lot of useful information regarding the system you are connecting to, and use them to populate a set of variables you can use in your playbook.
It’s a handy way to have a quick access to information like IP addresses, memory availability and many other data or metrics you might base your logic onto.

What about loops?.

Bash has that bizarre rule of carriage return vs. semicolon, that makes me unable to remember how to write one without checking the internet.

In Ansible is simpler:

...
- name: sample loop, using debug but works with any module
  debug:
    msg: {{ item }}
  with_items:
  - "myFirstItem"
  - "mySecondItem"
...

There is a collection of with_* keywords you can use to iterate over specific objects, in our case a plain YAML list, but it could be a dictionary, a list of files names with support for GLOB and many others.

Conditional flow
There is strict equivalent for an if/else logic. At least not to create a traditional execution tree.
The patter here is to push you to keep the execution logic linear, with just a boolean flag that enables or disable the specific task.

And it’s as simple as writing:

...
when: true
...

And you guess right if you think that any expression that evaluates to true can specified there, something like: ansible_os_family == "RedHat" and ansible_lsb.major_release|int >= 6.

Don’t be disappointed by my comment on the fact that Ansible doesn’t suggest you have nested if/else logic. You can still get that with the combination of other keywords like block and when. But the point is to try to keep the execution flow linear, thus simpler.

The last advanced feature I want to cover is Ansible templates and string interpolation.

The tool has an embedded mechanism to allow you to build dynamic strings. This allows you to define a basic skeleton for a text where you define some placeholders, that are automatically replaced by the values of the corresponding variables that are present in the running context.

For example:

debug:
  msg: "Hello {{ inventory_hostname }}"

The same feature is also available in the form of template module: you can define an external file, containing your template, that Ansible will process applying values for the tokens it finds, and in a single pass, copy it to the remote system. This, is an ideal use case for configuration, for example:

template:
  src: /mytemplates/foo.j2
  dest: /etc/file.conf
  owner: bin
  mode: 0644

I decide to stop my overview here, despite Ansible offers you many more features you expect from a DevOps tool:

and many others!

My hope is that this article could trigger your curiosity to explore the topic on your own, having shown that despite you are probably able to obtain the same effect with other technologies, Ansible proposition is interesting and might deserve a deeper look.