The DevOps 2.3 Toolkit
上QQ阅读APP看书,第一时间看更新

A short history of infrastructure management

A long time ago in a galaxy far, far away...

We would order servers and wait for months until they arrive. To make our misery worse, even after they come, we'd wait for weeks, sometimes even months, until they are placed in racks and provisioned. Most of the time we were waiting for something to happen. Wait for servers, wait until they are provisioned, wait until you get approval to deploy, then wait some more. Only patient people could be software engineers. And yet, that was the time after perforated cards and floppy disks. We had internet or some other way to connect to machines remotely. Still, everything required a lot of waiting.

Given how long it would take to have a fully functioning server, it came as no surprise that only a select few had access to them. If someone does something that should not be done, we could face an extended downtime. On top of that, nobody knew what was running on those servers. Since everything was being done manually, after a while, those servers would become a dumping ground. Things get accumulated over time. No matter how much effort is put into documentation, given enough time, the state of the servers would always diverge from the documentation. That is the nature of manual provisioning and installations. Sysadmin became a god-like person. He was the only one who knew everything or, more likely, faked that he does. He was the dungeon keeper. He had the keys to the kingdom. Everyone was replaceable but him.

Then came configuration management tools. We got CFEngine. It was based on promise theory and was capable of putting a server into the desired state no matter what its actual state was. At least, that was the theory. Even with its shortcomings, CFEngine fulfilled its primary objective. It allowed us to specify the state of static infrastructure and have a reasonable guarantee that it will be achieved. Aside from its main goal, it was an advance towards documented servers setup. Instead of manual hocus-pocus type of actions which resulted in often significant discrepancies between documentation and the actual state, CFEngine allowed us to have a specification that (almost) entirely matches the actual state. Another big advantage it provided is the ability to have, more or less, the same setup for different environments. Servers dedicated to testing could be (almost) the same as those assigned to production. Unfortunately, usage of CFEngine and similar tools were not yet widespread. We had to wait for virtual machines before automated configuration management become a norm. However, CFEngine was not designed for virtual machines. They were meant to work with static, bare metal servers. Still, CFEngine was a massive contribution to the industry even though it failed to get widespread adoption.

After CFEngine came Chef, Puppet, Ansible, Salt, and other similar tools. Life was good until virtual machines came into being or, to be more precise, became widely used. We'll go back to those tools soon. For now, let's turn to the next evolutionary improvement.

Besides forcing us to be patient, physical servers were a massive waste in resource utilization. They came in predefined sizes and, since waiting time was considerable, we often opted for big ones. The bigger, the better. That meant that an application or a service usually required less CPU and memory than the server offered. Unless you do not care about costs, that meant that we'd deploy multiple applications to a single server. The result was a dependencies nightmare. We had to choose between freedom and standardization.

Freedom meant that different applications could use different runtime dependencies. One service could require JDK3 while the other might need JDK4. A third one might be compiled with C. You probably understand where this is going. The more applications we host on a single server, the more dependencies there are. More often than not, those dependencies were conflicting and would produce side effects no one expected. Thanks to our inherent need to convert any expertise into a separate department, those in charge of infrastructure were quick to dismiss freedom in favour of reliability. That translates into "the easier it is for me, the more reliable it is for you." Freedom lost, standardization won.

Standardization starts with systems architects deciding the only right way to develop and deploy something. They are a curious bunch of people. With the risk of putting everyone in the same group and ridiculing the profession, I'll describe an average systems architect as a (probably experienced) coder that decided to climb his company's ladder. While on the subject of ladders, there are often two of those. One is the management ladder that requires an extensive knowledge of Microsoft Word and Excel. Expert knowledge of all MS Office tools is a bonus. Those who mastered MS Project were considered the ultimate experts. Oh, I forgot about email skills. They had to be capable of sending at least fifteen emails a day asking for status reports.

Most expert coders (old timers) would not choose that path. Many preferred to remain technical. That meant taking over systems architect role. The problem is that the "technical path" was often a deceit. Architects would still have to master all the management skills (for example, Word, Excel, and email) with the additional ability to draw diagrams. That wasn't easy. A systems architect had to know how to draw a rectangle, a circle, and a triangle. He had to be proficient in coloring them as well as in connecting them with lines. There were dotted and full lines. Some had to end like an arrow. Choosing the direction of an arrow was a challenge in itself so the lines would often end up with arrows at both ends.

The important part of being an architect is that drawing diagrams and writing countless pages of Word documents was so time demanding, that coding stopped being something they do. They stopped learning and exploring beyond Google search and comparative tables. The net result is that the architecture would reflect knowledge an architect had before they jumped to the new position.

Why am I talking about architects? The reason is simple. They were in charge of standardization demanded by sysadmins. They would draw their diagrams and choose the stack, developers would use. Whatever that stack was, it was to be considered Bible and followed to the letter. Sysadmins were happy since there was a standard and a predefined way to set up a server. Architects were thrilled because their diagrams served a purpose. Since those stacks were supposed to last forever, developers were excited since there was no need for them to learn anything new. Standardization killed innovation, but everyone was happy. Happiness is necessary, isn't it? Why do we need Java 6 if JDK2 works great? It's been proven by countless diagrams.

Then came Virtual machines and broke everyone's happiness.

Virtual machines (VMs) were a massive improvement over bare metal infrastructure. They allowed us to be more precise with hardware requirements. They could be created and destroyed quickly. They could differ. One could host Java application, and the other could be dedicated to Ruby on Rails. We could get them in a matter of minutes, instead of waiting for months. Still, it took quite a while until "could" became "can". Even though the advantages brought by VMs were numerous, years passed until they were widely adopted. Even then, the adoption was usually wrong. Companies often moved the same practices used with bare metal servers into virtual machines. That is not to say that adopting VMs did not bring immediate value. Waiting time for servers dropped from months to weeks. If it wasn't for administrative tasks, manual operations, and operational bottlenecks, they could have reduced waiting time to minutes. Still, waiting for weeks was better than waiting for months. Another benefit is that we could have identical servers in different environments. Companies started copying VMs. While that was much better than before, it did not solve the problem of missing documentation and the ability to create VMs from scratch. Still, multiple identical environments are better than one, even if that meant that we don't know what's inside.

While the adoption of VMs was increasing, so did the number of configuration management tools. We got Chef, Puppet, Ansible, Salt, and so on. Some of them might have existed before VMs. Still, virtual machines made them popular. They helped spread the adoption of "infrastructure as code" principles. However, those tools were based on the same principles as CFEngine. That means that they were designed with static infrastructure in mind. On the other hand, VMs opened the doors to dynamic infrastructure where VMs are continuously created and destroyed. Mutability and constant creation and destruction were clashing. Mutable infrastructure is well suited for static infrastructure. It does not respond well to challenges brought with dynamic nature of modern data centers. Mutability had to give way to immutability.

When ideas behind immutable infrastructure started getting traction, people began combining them with the concepts behind configuration management. However, tools available at that time were not fit for the job. They (Chef, Puppet, Ansible, and the like) were designed with the idea that servers are brought into the desired state at runtime. Immutable processes, on the other hand, assume that (almost) nothing is changeable at runtime. Artifacts were supposed to be created as immutable images. In case of infrastructure, that meant that VMs are created from images, and not changed at runtime. If an upgrade is needed, new image should be created followed with a replacement of old VMs with new ones based on the new image. Such processes brought speed and reliability. With proper tests in place, immutable is always more reliable than mutable.

Hence, we got tools capable of building VM images. Today, they are ruled by Packer. Configuration management tools quickly jumped on board, and their vendors told us that they work equally well for configuring images as servers at runtime. However, that was not the case due to the logic behind those tools. They are designed to put a server that is in an unknown state into the desired state. They assume that we are not sure what the current state is. VM images, on the other hand, are always based on an image with a known state. If for example, we choose Ubuntu as a base image, we know what's inside it. Adding additional packages and configurations is easy. There is no need for things like "if this then that, otherwise something else." A simple shell script is as good as any configuration management tool when the current state is known. Creating a VM image is reasonably straightforward with Packer alone. Still, not all was lost for configuration management tools. We could still use them to orchestrate the creation of VMs based on images and, potentially, do some runtime configuration that couldn't be baked in. Right?

The way we orchestrate infrastructure had to change as well. A higher level of dynamism and elasticity was required. That became especially evident with the emergence of cloud hosting providers like Amazon Web Services (AWS) and, later on, Azure and GCE. They showed us what can be done. While some companies embraced the cloud, others went into defensive positions. "We can build an internal cloud", "AWS is too expensive", "I would, but I can't because of legislation", and "our market is different", are only a few ill-conceived excuses often given by people who are desperately trying to maintain status quo. That is not to say that there is no truth in those statements but that, more often than not, they are used as an excuse, not for real reasons.

Still, the cloud did manage to become the way to do things, and companies moved their infrastructure to one of the providers. Or, at least, started thinking about it. The number of companies that are abandoning on-premise infrastructure is continuously increasing, and we can safely predict that the trend will continue. Still, the question remains. How do we manage infrastructure in the cloud with all the benefits it gives us? How do we handle its highly dynamic nature? The answer comes in the form of vendor-specific tools like CloudFormation or agnostic solutions like Terraform. When combined with tools that allow us to create images, they represent a new generation of configuration management. We are talking about full automation backed by immutability.

We're living in an era without the need to SSH into servers.

Today, modern infrastructure is created from immutable images. Any upgrade is performed by building new images and performing rolling updates that will replace VMs one by one. Infrastructure dependencies are never changed at runtime. Tools like Packer, Terraform, CloudFormation, and the like are the answer to today's problems.

One of the inherent benefits behind immutability is a clear division between infrastructure and deployments. Until not long ago, the two meshed together into an inseparable process. With infrastructure becoming a service, deployment processes can be clearly separated, thus allowing different teams, individuals, and expertise to take control.

We'll need to go back in time one more time and discuss the history of deployments. Did they change as much as infrastructure?