Ansible Inventory 2.0 design rules

This is my third post in the Ansible Inventory series. See the first and the second posts for some background information.

Preliminary note: in this post, I try to give an example of a typical deployment inventory, and what rules we might need the inventory to abide to. Whilst I tried to keep things as simple as possible, I still feel this gets too complex. I’m not sure if it’s just a complex matter, if that’s me over complicating things, or if Ansible just should not try to handle things in a more convoluted way.

Let’s have an example of how inventory could be handled for a LAMP setup. Let’s assume we have these 3 set’s of applications:

  • an Apache + PHP setup
  • a MySQL cluster setup
  • an Apache reverse Proxy

We also have 3 environments: development, testing and production.

We have 4 different PHP applications, A, B, C and D. We have 2 MySQL cluster instances, CL1 (for dev and testing) and CL2 (for production). We have a single reverse proxy setup that manages all environments.

The Apache PHP application gets installed on one of three nodes, 1 per environment: node1 (dev), node2 (test) and node3 (prod).

For each role in ansible (assume 1 role per application here), we have to define a set of variables (template) that gets applied to the nodes. If we focus on the apache-php apps for this example, the apache-php varset template get instantiated 4 time, 1 for each of A, B, C and D. Assume the url for where the application gets published is part of each varset.

Each application gets installed on each node, respectively in one of the three environments. Each Apache-PHP node will need a list of those 4 applications, so it can define the needed virtual host, and set each application in its subdirectory. Where each application was just a set of key values, to define the single php app, we now need to listify those 4  varsets into a list that can be iterated on the apache config level.

Also, each Apache-RP node will need a list of applications, even when those applications are not directly installed on the reverse proxy nodes. The domain part (say contoso.com) is a specific domain for your organisation. Each application gets published beneath a specific context subfolder (contoso.com/appA, ..). For each environment we have a dedicated supdomain. We finally get 12 frontends: {dev,test,prod}.constoso.com/{appA.appB,appC,appD}. This 12 values must become part of a list of 12 values, and be exported to the reverse proxy nodes, together with the endpoint of the respective backend. (1)

Similarly CL1 needs a list of the applications in dev and test, and CL2 needs a list of applications in prod. We need a way to say that a particular variable that applies to a group of nodes, needs to be exported to a group of other nodes.

So, the initial var sets we had at the app level, get’s merged at some point when applied to a node. In this example, merging means, make a list out of the different single applications. It also means overrule: the environment gets overruled by membership of a certain environment group (like for the subdomain part).

Something similar could happen for the php version. One app could need PHP 5, whilst another would need PHP7, which could bring in a constraint that gets the application deployed on separate nodes within the same environment.

Of course, this can get very complicated, very quickly. The key is to define some basic rules the inventory needs (merge dictionaries, listify varsets, overrule vars, export vars to other hosts) and try to keep things simple.

 

Allow me to summarize a bunch of rules I came up with.

  • inventory is a group tree that consists of a set of subtrees, that each instantiates some meaningfull organisational function; typical subtrees are
    • organisation/customer
    • application
    • environment
    • location
  • variable sets define how they get merged
  • a subtree basically starts where a var set is defined to some child group 
  • all groups are equal, rules for groups are set by the variable sets assigned to them and how those should be inherited
  • those rules typically kick in when a group has multiple parents, when it’s a merge group
  • lookup plugins could be re-invented at this (merge) level to create new lists
  • an inventory tree typically has subtrees, and each subtree is the initial source for some variable sets (typically child group of an application subtree)
  • not clear yet: how to import and map an external inventory (dynamic script) into the local inventory scheme 
  • a variable is part of a variable set, and is defined by a schema; variables can merge doing a hash merge, by listifying a var, or adding lists and defining a form of precedence (a weight, assigned to group sub tree’s, not by group depth any more)
    • it is namespaced by the variable set (could be linked to a specific application, perhaps maps onto an Ansible role)
    • it has a name
    • a type (single value, string, int, .. or a list or a dictionary…)
    • define a merge strategy (listify, merge list, add list, dictionary merge, deep merge, …)
    • when applied to a group (subtree), it defines a weight, check that no trees have the same weight!
    • it has a role: parameter (a plain inventory variable), or it is a runtime variable (feedback from playbook execution, or it is a fact (the latter two could perhaps be the same)
    • track its source (applied to a group, some external inventory, …)
    • define a group_by rule, grouping/listifying it for serveral hosts (like the puppet external resources)
    • track which node is a master node
  • merge groups could also be “cluster groups” = the groups that hold and instantiate nodes that are part of a common application pool
  • whilst nodes can host different application and hence be part of multiple cluster/merge groups, they can also be part of multiple other trees (think like separate nodes of a cluster that are part of different racks, or datacenters?)
  • merging variables can happen everywhere a node or group is member of different parents that hold the same variable set; hence at group level or at node level
  • nodes are children of merge groups and other subtree’s groups 
  • nodes can be members of multiple cluster/merge groups
  • which node in a cluster group is the master node is related to a var set
  • being the master can be initially a plain parameter, but is overruled by its runtime value (think of master fail over)
  • when applying var sets to groups, define a weight; when merging vars within the same subtree, look at a merge strategy; hash merges might need a weight too?
  • variable sets are defined in a group in some subtree, and can be overriden in groups from other trees

 

Overview in a nut shell:

(1) This is probably the point where service discovery becomes a better option

The post Ansible Inventory 2.0 design rules appeared first on Serge van Ginderachter.

Some first design ideas for an Ansible Inventory 2.0

In a my previous post “Current state of the Ansible inventory and how it might evolve” I explained some parts of the Ansible Inventory internals, and pointed out some features I would like to improve.

Whilst this exercise might be interesting to Ansible and specifically its internal inventory, it might also just be an idea for an external application that yields a flattened inventory (though an inventory plugin/ dynamic script), or it might be interesting to see if other configuration management tools might make use of it, as some sort of central or “single source of truth”.

Whereas currently the inventory has simple groups, that hold child groups, has parent groups, and can contain hosts, I believe a more rigid structure with more meta information would be beneficial. Not only to manage group trees, but also to manage variables assigned and defined in those groups, and managing the values throughout the parent child inheritance.

Next up some design ideas I have been playing with. A big part of this, is that, to me, managing inventory is much more about managing variable sets and their values, not just grouping hosts.

  1. inventory starts top level with a special root group. A bit like the all group we currently have. The root group is the only one that has 0 parents, and has one or more normal child groups. These child groups are the root groups for a subtree;
  2. a subtree holds sets of variables. ideally, a particular variable only lives in one single subtree;
  3. a normal group has 1 parent, and one ore more child groups;
  4. a merge group is a special group that can have more than one parent groups, but each parent must be member of a different subtree;
    • a merge group would typically merge sets of variables from different subtrees;
    • ideally a var does not exist in different parent trees, as to not have to deal with arbitrary precedence;
    • but maybe such a var holds e.g. a virtual host, and should at merge time become a list of virtual hosts, to be deployed on an apache instance;
    • care should be taken when a particular variable exists in different trees ;
  5. a merge group could also be cluster or instance group, or have such groups as a child, which means it has no child groups, but holds only hosts;
    • merge groups could also be dynamic: a child of postgres group and child of testing group would yield a postgres-test group
    • those groups need to track which subtrees they have in their ancestors
    • instead of tracking subtrees, perhaps track variable sets (and have a rule where a var can only exist in one set)
  6. a cluster group could keep track of which hosts in its group is a master (e.g. its’s a mysql master-slave cluster); such a property is of course dynamic; this would help to write playbooks that only have to run once per cluster, and on the master;
  7. a host can be member of different merge or cluster groups, e.g. when that hosts holds multiple roles. e.g. as a single LAMP stack, it runs mysql (with different databases) and apache (with different virtual hosts)
    • inheriting from multiple groups that are member of the same subtree, means something like having multiple instances of an applications, or virtual hosting applied on a host
    • this might be where the description for an application gets translated to what is needed to configure that application on one or more hosts
    • multiple app instances, can be bundled on a host, and more of them can be spread on multiple hosts
    • a single variable might needed to become a list on a specific instance
  8. merging groups is actually about merging the variables the hold
  9. a variable set is (meta) defined in a subtree; some vars might have a default, and some vars need to be updated when that default changes (perhaps a new dns server in your DC), whilst other may not be updated (the Java version your application was deployed with);
  10. at some point I tinkered on the idea of location groups/trees, which might be a thing more separate from classic organisational and application focused groups, to manage things like geographic location datacenter etc. but I’m not sure this still warrants a special kind of groups;
    • a geographical group membership could perhaps change the domain name of an url

But the point of all this is primarily to manage variables in the inventory. To be able to parametrize an application, to describe that application in perhaps a more high level way. Inventory should then allow you to transpose those values in a way that they easily apply to the host based execution level (the playbooks, and roles). This also includes a way to Puppet style “export” resources to other hosts.

Roles can be written and used in two ways, when deploying multiple instances of an application: (1) a role defines a basic application, and is called multiple times, perhaps as a parameterized role (but role: with_items: might be needed and that is not possible currently in Ansible); and (2) the role itself loops over a list of instances, where inventory translates membership of multiple apache virtualhosts instances to a list of virtual hosts per Ansible host.

The latter might be a more generic way of exporting resources. An example. Some subtree manages the setup of a single apache site. At some point multiple sites are defined. Sites will be grouped and installed on one of multiple apache setups. Here you happen to export virtual hosts into a list of virtualhosts for one apache. In a next step, *all* those virtualhosts get exported in a big list that configures your load balancer.

We need some generic way to create lists of things grouped  by a certain parameter.

Variables get inherited throughout the inventory trees. This could happen in a way where some precedence makes one value to overwrite another, or in a way where multiple values become a list of values. This might be part of some schema for variable sets in a specific tree? Another idea might be to not care about group types, and just apply rules groups via the variable sets they carry, track which sets a group inherits from, perhaps namespace them. Define how variable sets should merge, listify, or are not allowed to be combined.

How do we plugin external data into this model? Should the equivalent of current dynamic inventory scripts be mapped on a subtree? Or span multiple locations? Be mapped on a specific variable set? Hard to say in e general rule. Lots of those inventoruy scripts focus on host and groups, and perhaps some facts. Whilst this model has a bigger focus on managing variables.

Putting some more logic in the inventory could also mean that part of the manipulation that lookup plugins perform could happen in inventory. This would greatly simplify how we write loops in roles, by being able to do everyhing with a simple standard with_items.

As Dag Wieëers summarised his view on inventory to me, a new inventory should allow us to

  1. combine data from different sources, into a single source of truth
  2. do dynamic facts manipulation
  3. have a deterministic but configurable hierarchy

Another model that users tend to use in different ways, is where the host creation happens. Some start to define it in ansible inventory, then create the host with e.g. a vmware role, other import the host list from an external inventory, e.g. ec2. The way we import inventory data from external hosts should be well defined, how we map external groups and hosts and variables into this inventory model. Of course a new inventory should have a more elaborate API, not only internally, but also shown at the json API for dynamic inventory scripts.

Now, all of this sounds probably overly complex, and overdoing this new design is a serious risk. But I do hope to come to a model with just some basic simple rules that allows to implement all these ideas. If you have ideas on this, feel free to comment here of get in touch with me to further discuss this!

 

The post Some first design ideas for an Ansible Inventory 2.0 appeared first on Serge van Ginderachter.

Current state of the Ansible inventory and how it might evolve

This is an introductory post about the Inventory in Ansible where I’m looking at the current design and implementation, some of it internals, and where hope to yield some discussion and ideas on how it could be improved, be extended. A recent discussion at Devopsdays Ghent last week, re-spawned my interest in this topic, with some people actively showing interest to participate. Part of that interest is about building some standard inventory tool with some API and frontend, similar to what Vincent Van der Kussen started (and also lots of other often now abandoned projects) but going way further. Of course, that exercise would be pointless when not looking at what parts need to happen upstream, and what parts are more fitting in a separate project, or not. That’s also why my initial call to people interested in this didn’t focus on immediately bringing that discussion on one of the Ansible mailing lists. 20161103115009 In it’s current state, the Ansible Inventory – at the time of writing 2.2 was just released – hasn’t changed since it’s initial inception on how it was modelled during it’s early 0.x releases. This post tries to explain a bit how it was designed, how it works, and what might be its limits. I might oversimplify some details whilst focusing on the internal model and how data is kept in data structures. ansible-host-inventory1 Whilst most people tend to see the inventory as just the host list, and a way to group them, it is much more than that, as parameters to each host, inventory variables, are also part of the inventory. Initially, those group and host variables where implemented as vars plugins, and whilst the documentation still seems to imply this, this hasn’t been true since around a major bugfix and update to the inventory in the 1.7 release, now over two years ago, where this part now is a fixed part of the ansible.inventory code. As far as I know, nobody seems to use custom vars plugins. I’d argue that the part where one manages variables, parameters to playbooks and roles, is the most important part in inventory. Structurally, the inventory model comes down to this:
The inventory basically is a list of hosts, which can be member of one or more groups, and where each group can be a child of one or more other (parent) groups. One can define variables on each of those hosts and groups, and define different values for all of those hosts. Groups and hosts inherit variables from their parents.
The inventory is mostly pre-parsed at the beginning of an Ansible run. After that, you can consider the inventory as being a set of groups, where hosts live, and each host has a set of variables with one specific value attached to it. An often made misunderstanding that comes up on the mailing list every now and then, is thinking a host can have a different value for a specific variable, depending on which group was used to target that host in a playbook. Ansible doesn’t work like that. Ansible takes a hosts: definition and calculates a list of hosts. In the end which exact group was used to get to that host, doesn’t matter anymore. Before hosts are touched, and variables are used, those variables always are calculated down to the host. Assigning different values to different groups, is how you can manage those, but in the end, you could choose to never use group_vars, and put everything yourself, manually, in host_vars, and get the same end result. Now, if a host is member of multiple groups, and the same variable is defined in (some of) those groups, the question is, which value will prevail? Variable precedence in Ansible is quite a beast, and can be quite complex or at least daunting to both new and experienced users. The Ansible docs overview doesn’t explain alle the nifty details, and I couldn’t even find an explanation how precedence works within group_vars. Now, the short story here is, the more specific and down the tree wins. A child group wins over a parent group, host vars always win over group vars.
When host kid is member of a father group, and that father group is member of a grandfather group, then the kid will inherit variables from grandfather and father. Father could overrule a value from grandfather, and kid can overrule his father and grandfather if he wants. Modern family values.
There are also two special default groups: all and ungrouped. The former contains all groups that are not defined as a child group of another group, the latter contains all hosts that are not made member of a group. 1 But what if I have two application groups, app1 and app2, which are not parent-child related, and both define the same variable? In this case, both groups app1 and app2 live on the same level, and have the same ‘depth‘. Which one will prevail depends on the alphanumerical sorting of both names – IIRC – but I’m not even sure of the details. 2 That depth parameter is actually an internal parameter of the Group object. Each time a group is made member of of a parent group, that group get’s the depth from it’s parent + 1 unless if that group’s depth was already bigger than that newly calculated depth. The special ALL group has a depth of 0, app1 and app2 both have a depth of 1, and app21 got a depth of two. For a variable defined in all those groups, the value in app21 will be inherited by node2, whilst node1 will get the value from either app1 or app2, which is more or less undefined. That’s one of the reasons why I recommend to not define the same variables in multiple group “trees”, where a group tree is one big category of groups. It’s already hard to manage groups within a specific sub tree, whilst keeping precedence (and hence depth) in mind, it’s totally impractical to track that amongst groups from different sub trees. 20161103084137 Oh, if you want to generate a nice graphic of your inventory, to see how it lays out, Will Thames manages a nice project (https://github.com/willthames/ansible-inventory-grapher) that does just that. As long as graphviz manages to generate a small enough graph that fits on a sheet of paper, whilst remaining readable, you probably have a small inventory. Unless you write your own application to manage all this, and write a dynamic inventory script to export data to Ansible, one is stuck with mostly the INI style hosts files, and the yaml group_vars and host_vars files to manage those groups and variables. If you need to manage lots of hosts, lots of applications, you probably end up with lots of categories to organise these groups, and then it’s very easy to loose any overview in how those groups are structured, and how variables inherit over those groups. It becomes hard to predict which value will prevail for a particular host, which doesn’t help to ensure consistency. If you were to change default values in the all group, whilst some hosts have an overruled value defined in child groups, but not all, you suddenly change values for a bunch of hosts. That might be your intention, but when managing hundreds of different groups, you know such a mistake might easily happen when all you have are the basic Ansible inventory files. Ansible tends (or at least often used to) to recommend not doing such complicated things, “better remodel how you structure your groups” – but I think managing even moderately large infrastructures can quickly have complex demands on how to structure your data model. Different levels of organisation (sub trees) are often needed in this concept. Many of us need to manually create intersection of groups such as app1 ~ development => app1-dev to be able to manage different parameters for different applications in different environments. At scale this quickly becomes cumbersome with the ini and yaml file based inventory, as that doesn’t scale. Maybe we need a good pattern to handle dynamic intersections implemented upstream? Yes, hosts: app1&dev, but that is parsed at run time, and you can’t assign vars to such an intersection. An interesting approach is how Saltstack – which doesn’t have the notion of groups in the way Ansible does – lets you target hosts using grains, a kind of dynamic groups filtering hosts based on facts. Perhaps this project does something similar? Putting more logic on how we manage the inventory, could be beneficial in several ways. Some ideas. Besides scaling the current model by having a better tooling, it could for example allow to write simpler roles/playbooks, e.g. where the only lookup plugin we need to use for with_ iterations, would be the standard with_items, as the other ones are actually used to handle more complex data and generate lists from it. That could be part of the inventory, where the real infrastructure-as-data is modelled, doing a better decoupling of code (roles) and config (inventory). Possibly a way to really describe infrastructure as a higher level data model, describing a multi-tier application, that gets translated into specific configurations and action on the node level. How about tracking the source of (the value for) a variable, allowing to make a difference between having a variable inherit a default and updating from changing that default from a more generic group (e.g. the dns servers for the DC), as opposed to only instantiate a variable from a default, and not letting it change afterwards by inheriting that default (e.g. the Java version development starts out with during the very first development deploy). Where are gathered facts kept? For some, just caching those as it currently happens, is more than enough. For others, especially more specific custom facts – I’d call those run time values – that can have a direct relationship with inventory parameters, it might make more sense to keep them in the inventory. Think about the version of your application: you configure that as a parameter, but how do you know if that version is also the one that is currently deployed? One might need to keep track of the  deployed version (runtime fact) and compare that to what was configured to be deployed (inventory parameter). An often needed pattern when deploying clusters, is the concept of a specific cluster group. I’ve seen roles acting on the master node in a cluster by conditionally running when: inventory_hostname = myapplicationcluster[0].
How many of you are aware that the order of the node list of a group was reversed somewhere between two 1.x releases? It was initially reverse alphanumerically ordered.
This should nicely illustrate the problem of relying on undocumented and untested behaviour. This pattern also makes you hard coding an inventory group name, in a role, which I think is ugly, and makes your role less portable. Doing a run_once can solve a similar problem, but what if your play targets a group of multiple clusters, instead of a specific cluster where you need to run_once per cluster? Perhaps we need to introduce a special kind of group that can handle the concept of a cluster? Metadata to groups, hosts and their variables? Another hard pattern to implement in Ansible, is what Puppet solves with their exported resources. Take a list of applications that are deployed on multiple (application) hosts, and put that in a list so we can deploy a load balancer on 1 specific node. Or monitoring, or ACL to access those different applications. As Ansible in the end manages just vars per hosts, doing things amongst multiple hosts is hard. Can’t solve everything with a delegate_to. node-collaboration-exported-resources-and-puppetdb-7-638 At some point, we might want to parse the code (the roles) that will be run, as to import a list of variables that are needed, and build some front-end, so the user can instantiate his node definitions and parameterize the right vars in inventory. How about integrating a well managed inventory, with external sources? Currently, that happens by defining the Ansible inventory as a directory, and combining ini files with dynamic inventory scripts. I won’t get started on the merits of dir.py, but let’s say we really need to redesign that into something more clever, that integrates multiple sources, keeps track of metadata, etc. William Leemans started something on this after some discussion at Loadays in 2014, implementing specific external connectors. Perhaps on a side note, I have also been thinking a lot on versioning. Versioning of the deploy code, and especially roles here. I want to be able to track the life cycle of an application, it’s different versions, and the life cycle and versions of it’s deployment. Imagine I start a business where I offer hosted Gitlab for multiple customers. Each of those customers potentially has multiple organisations, and hence Gitlab instances. Each of them potentially want to always run the latest version, whilst some want to remain ages on the same enterprizy first install, and others are somewhere in between. Some might even have different DTAP environments – Gitlab might be a bad example here as not all customers will do custom Gitlab development, but you get the idea. In the end you really have LOTS of Gitlab instances to manage. Will you manage them all with one single role? At some point the role needs development. Needs testing. A new version of the role needs to be used in production, for a specific customer’s instance. How can I manage these different role version in the Ansible eco-system? Doing that in the inventory sounds like the central source of information? Lots of ideas. Lots of things to discuss. I fully expect to hear about many other use cases I never thought of, or never even would need. And as to not re-invent the wheel, insights from other tools are very welcome here!

Packt Publishing Ansible Configuration Management review

Around late November 2013 I – too – got contacted by Packt Publishing, asking to do a review on Ansible Configuration Management. I was a bit surprised, as I had declined their offer to write that book, which they asked me exactly two months earlier. Two months seemed like a short period of time to manage to write a book and get it published.

Either way, I kind of agreed, and got the book in pdf, printed it out, started some reading, lended it to a colleague (we us Ansible extensively at work), and just recently got it back so I could finish to have a look at it.

“Ansible Configuration Management” is an introductory book for beginners. I won’t introduce Ansible here, there are a lot of good resources on that, just duck it. Ansible being relatively new, has evolved quite a bit in the previous year, releasing 1.4 by the end of November. The current development cycle focuses more on bug fixes, and under the hood stuff, and less on new syntax, which was quite the opposite when going from 0.9 through 1.2, and up until the then and now current 1.3.

Knowing what major changes would get into 1.3 was easy when you followed the project. One of the major changes is the syntax for variables and templates. Basically, don’t use $myvar or ${othervar} any more, but only use {{ anicevar }}. If you know ansible, you know this is an important thing. I was very disappointed to notice the author didn’t stress this. Whilst most examples use the new syntax, at one point all syntax’s are presented as equally possible – which is correct for the then latest 1.3, but it was well known at the time it would be deprecated.

Of course, writing a tech book on a rapid evolving Open Source tool, will always be deprecated by the time it gets published. But I think this should be expected, and a good book on such a subject should of course focus on the most recent possible release, but also try to mention the newer features that are to be expected. Especially for a publisher that also focuses on Open Source.

A quirk, is when code snippets are discussed. Some of those longer snippets are printed across more than one page, and the book mentions certain line numbers. Which is confusing, and even unreadable, when the snippets don’t have line numbers. Later in the book, sometimes line numbers are used, but not in a very standard way:

 

code snippet with weird line number

code snippet with weird line number

Whilst most of this book has a clear layout at first sight, things like this don’t feel very professional.

This books gives a broad overview and discusses several basic things in ansible. It goes from basic syntax, over inventory, small playbooks, and extended playbooks ans also mentions custom code things. It gives lots of examples, discusses special variables, modules, plugins… and many more. Not all of them, but that is not needed, given the very good documentation the project publishes. This book is an introduction to ansible, so focusing on the big principles is more important at this point, than having a full inventory of all features. As it’s a relatively short book (around 75 pages), it’s small enough to be appealing as a quick introduction.

It’s a pity the publisher and the author didn’t pay more attention to details. The less critical user, with little to no previous ansible experience, will however get a good enough introduction with this book, with some more hand-holding and overview than what can easily be found freely on-line.

Git and Github: keeping a feature branch updated with upstream?

Git and github, you gotta love them for managing and contributing to (FLOSS) projects.

Contributing to a Github hosted project becomes very easy. Fork the project to your personal Github account, clone your fork locally, create a feature branch, make some patch, commit, push back to your personal Github account, and issue a pull request from your feature branch to the upstream (master) branch.


git clone -o svg git@github.com:sergevanginderachter/ansible.git
cd ansible
git remote add upstream git://github.com/ansible/ansible.git
git checkout -b user-non-unique
vi library/user
git add library user
git commit -m "Add nonunique option to user module, translating to the -o/--non-unique option to useradd and usermod."
git push --set-upstream svg user-non-unique
[go to github and issue the pull request]

Now, imagine upstream (1) doesn’t approve your commit and asks for a further tweak and (2) you need to pull in newer changes (upstream changes that were committed after you created your feature branch.)

How do we keep this feature branch up to date? Merging the newest upstream commits is easy, but you want to avoid creating a merge commit, as that won’t be appreciated when pushed to upstream: you are then effectively re-committing upstream changes, and those upstream commits will get a new hash (as they get a new parent). This is especially important, as those merged commits would be reflected in your Github pull request when you push those updates to your personal github feature branch (even if you do that after you issued the pull request.)

That’s why we need to rebase instead of merging:


git co devel #devel is ansible's HEAD aka "master" branch
git pull --rebase upstream devel
git co user-non-unique
git rebase devel

Both the rebase option and rebase command to git will keep your tree clean, and avoid having merge commits.
But keep in mind that those areyour first commits (with whom you issued your first pull request) that are being rebased, and which now have a new commit hash, which is different from the original hashes that are still in your remote github repo branch.

Now, pushing those updates out to your personal Github feature branch will fail here, as both branches differ: the local branch tree and the remote branch tree are “out of sync”, because of those different commit hashes. Git will tell you to first git pull --rebase, then push again, but this won’t be a simple fast-forward push, as your history got rewritten. Don’t do that!

The problem here is that you would again fetch your first changed commits as they were originally, and those will get merged on top of your local branch. Because of the out of sync state, this pull does not apply cleanly. You’ll get a b0rken history where your commits appear two times. When you would push all of this to your github feature branch, those changes will get reflected on the original pull request, which will get very, very ugly.

AFAIK, there is actually no totally clean solution to this. The best solution I found is to force push your local branch to your github branch (actually forcing a non-fast-orward update):

As per git-push(1):

Update the origin repository’s remote branch with local branch, allowing non-fast-forward updates. This can leave unreferenced commits dangling in the origin repository.

So don’t pull, just force push like this:

git push svg +user-non-unique

This will actually plainly overwrite your remote branch, with everything in your local branch. The commits which are in the remote stream (and caused the failure) will remain there, but will be dangling commit, which would eventually get deleted by git-gc(1). No big deal.

As I said, this is AFAICS the cleanest solution. The downside of this, is that your PR will be updated with those newest commits, which will get a later date, and could appear out of sync in the comment history of the PR. No big problem, but could potentially be confusing.

The Linux-Training Project: Linux Training v2 released

As announced in February new versions of the Linux training courses were being (re-)written by Paul.

I’m pleased to announce that v2 was merged in the master branch on github.

I you want to test it or just check it out:


git clone git://github.com/linuxtraining/lt.git
cd lt
git submodule init
git submodule update
./make.sh
./make.sh build fundamentals

Feedback is welcome, by mail at contributors@linux-training.be or via a github issue.

If you prefer to just download the latest books in PDF format, check out the download page. These are nightly builds from the master branch.

SSH RemoteCommand over netcat hopping, or not.

Patrick Debois‘ article on Chaining SSH tunnels inspired me to effectively start using this technique.

At first my use case was pretty simple. It wasn’t the host I needed to connect to which behind a firewall, but, as it turned out, I was.

I’ve got a box at home listening on a high port, as my provider is blocking the low <1024 ports, Which is a problem when I'm on a network which only allows outbound SSH connections on port 22. It's easy to hop around by first connecting to another server on regular port 22, but automating that with Patrick's proxycommand-plus-netcat trick proved to be handy in this situation, too.

I could easily add an entry to .ssh/config to manage this specific situation.

Now, the typical firewalled networks I need to use, I often work on them either by being on its premises, or remotely. When I need to connect remotely, especially for a longer period, I can often use a dedicated VPN for that, but very often I just want quickly check something on one host, and entering the network not by launching a full-blown VPN stack, but hopping through a SSH gateway, tends to be the preferred solution.

Normally, you then need separate .ssh/config entries for each host. And then you need a separate .ssh/config entry for each host using a specific ProxyCommand.

But this doesn’t scale well. One would need to manage redundant information. I don’t want to configure .ssh/config entries for each server separately to be reached from within its own LAN and remotely through SSH hopping.

I didn’t find a way to handle this with a config in .ssh/config only, but adding following little (Bash) function to my environment lets me use an extra ssh command which lets me use the same .ssh/config entries for both situations:

ssh-via () { proxy=$1 ; shift ; ssh -o Proxycommand="ssh $proxy nc %h %p" $* ; }

Just use ssh-via instead of plain ssh, and let the first parameter be the name of the .ssh/config entry for the gateway you need to use.