Want to Be a Better System Administrator? Learn to Code and Think Like a Developer

Want to Be a Better System Administrator? Learn to Code and Think Like a Developer

We are living times where you hear about DevOps everywhere, how the walls should be removed between these two worlds like Development and Operations, but all these speeches are based on the point of view from the developer and the business, but never from the point to view of the Administrator.

We are coming from a time where the operation teams where split on several levels of escalation where each level should be less populated and more skilled than the previous one. So we have a first level with people with basic knowledge that are working 24×7 covering any kind of incident that could happen. In case anything happen they try to solve it with the knowledge (usually more document than knowledge…) and in case something is not working as expected they forward it to a second level with more knowledge about the platform where they are probably an on-call team to handle that and we’re going to have so many levels as wanted. How all of this fit with Devops, CI & CD and so on…? Ok, pretty easy..

Level 1 today doesn’t exists: Monitoring tools, CI & CD and so on, make no needed this first level, because if you could create a document with the steps to do when something wrong happen you are writing code but inside a Document so nobody stops you to deliver an automated tool to do that. So, in plain english, yesterday first level operators are now scripts. But we still need our operation teams, our 24×7 seven service and so on.. for sure, because from time to time (more usually that we’d liked it) something out of the normal happens and that’s need to be managed.

So, automation is never going to replace L2 & L3, so you’re going to need skill people to handle incidents, maybe you could have a smaller team as you automate more process but never you can get rid of the knowledge part, that’s not the point. Here, we can discuss if this team could be the development team or a mix team from both worlds, and that could be right. Any approach is valid with this. So, we’ve implemented all our new fancy CI & CD process, monitoring tools and the platforms seems to be running without any help and support until somethings really strange happen. So, what to do with that people? Of course, teach the skills to be valuable as L2 & L3, so they have to be better operator / system administrator /whatever word you like the most. And how they can do that?

As I said before we are moving from a world where the Operation teams works based on written procedures and they have their imagination limited to look far from its approved protocol, but that’s not anymore the way L2 & L3 works. When you are facing an incident, the procedure is pretty much the same as hunting a bug, or if we escape from the IT world, it’s like to solving a crime. What are the similarities between solving a crime and managing a platform? Ok, let’s enumerate them:

  1. – What? — What happened to my system? You start with the consequences of the issue, probably a error log trace, probably another system calling you because you system is unavailable.. Ok, here you have, this is your dead body.
  2. When? — When you know something wrong happen, you start to find the root cause, and you start search logs traces to find the first one that generate the problem, even you discard the log traces that are consequences from the first one, and you try to find when everything starting to fail. To do that, you need to seek evidences about the crash and so on.. So now, you are investigating, searching for evidences, talking to witnesses (yeah, your log traces are the most trustworthy witnesses you are going to find, rarely they lied. It’s like they are on the stand in fron the of a judge)
  3. ….. And now? How & Why? — And that’s the difficult point, how & why, are the next steps as you do in a bug hunting, but the main difference here, is when the dev team is hunting a bug, they can correlate the evidences they gather on the step two, with the source code they built to know how and why everything goes wrong.. But in you case, as a system administrator you are facing probably a proprietary system or you don’t have access to the code or how to fight it even if it was open source.. and probably you don’t have even access to the source code from the dev team.. So, how do you solve this?
  • Ok, probably most of you are thinking something like: Knowing the product and your platform. Being a certificated operator of the product you are managing, know the whole manual from the product, and so on.. And that could be helpful, because that means you know better about how things works at a high level… but.. let’s be clear: Do you ever find in a certification course, or exam or documentation or whatever, so low-level info that could help you to the specific case you are facing.. ? In case the answer to my question is yes, maybe you’re not facing a difficult bug, but a main configuration error..
  • So.. what we can do? As the title said: Learn to code. But you are probably thinking, how can be related know to code with hunting a bug when I don’t have access to the code even to take a look? And.. learn to code in what language? on the components that are managed in my platform? in java? in Go? In node.js? In C++? In x86? All of them? Ok… you’re right, maybe the question is not simply learn to code but that’s the idea: Learn to code, learn to design, learn to architect solutions…. Do you want to know why? Ok, let’s see. In my whole career I’ve been working with a lot of different products, different approaches, different paradigms, different base languages, different everything, but all of them share one thing, that all the systems nowadays shared: They are built by people.

Yes, each piece of software, each server, each program, each web page, each everything is built by a person, like you and like me..

If you think that all the products and piece of software are done by genius you are wrong. Are you aware how many software pieces are available? Do you think that exists so many genius all over the world? Of course, they are skilled people and some of them are truly brilliant, and that’s why they usually follow the common sense to architect, design and build their solutions.

And that’s the point we can use to go crack down our case and solve our murder, because with the evidences we have and the ideas of building solution we have to think: Ok, how had I built this if I was the one in charge of this piece of software? And you are going to see that you are right almost every time…

But I’m missing another important point that we leave unanswered before.. Learn to code in which language? In the one you platform are based: If you are managing a OSGi based platform, learn a lot of java development and OSGI development and architecture, you are going to find that all the issues are the same thing.. A dependency between two OSGI modules, and Import-package sentence that should be there.. the other in which someone load the packages or some Export-Package sentence that should be there…

Same thing, if you are running a .NET Desktop application, learn a lot of .NET development and you’ll be skilled enough to don’t need a document to know what to do, because you know how this should be work.. and that is going to lead you to why this is happening.

And with all that questions answered, just only thing is left. You need to put in motion a plan to mitigate or solve the issue, so the issue is never happened again. And with all of than, we filed our arrest order to the incident.

That finally you are at the court part, when you present you’re evidence, your theory about how and why this happened (the motive 😛 ) and you should be able to convince the jury (the customer) beyond a reasonable doubt, and finally you finish with the sentence that you asked for the bug/crash/incident that are the mitigation plan, and you platform is a better world with one less incident walking free.

What we describe here is how to do a post-morten analysis and probably for most of you this is daily stuff you do, but all the times in customers when we work in collaboration with operation team, we notice that they don’t follow this approach, so they are stuck because they don’t have a document which tell us how to do (step by step) in this strange situations.

So, I’d like to finish with a anthem to summarize all of this: When you are facing an incident: “Keep calm, Apply common sense and start reading the log traces!!

Microservices vs SOA in Enterprise Integration: When to Use Each

The Microservices Hype: Why Everyone Wants to Apply It Everywhere

In the last two years, everyone is talking about microservices and they want to apply it anywhere.

It’s the same story with containers and docker (and it was before with Cloud Approach and even before that with SOA, BPM, EDA….).

Anything that has enough buzz from the community, it results with all customers (and “all kind” of customers) trying to apply the “new fashion” no matter what. Because of that all the System Integrator trying to search for somewhere where it fits (or even if it doesn’t fit…) to apply this “new thing” because it is “what we have to do now”. It’s like the fashion business. What is trendy today? That? Ok, Let’s do that.

Don’t get my wrong, this post is not going to be against microservice because I love the concept and I love the advantages it comes with it and how good it is to go to this kind of model.

But, I’d like to talk about some specific aspects that were not in the common talk and experience with this kind of technology. This pattern, model or paradigm, it is great and it is a proven success.

You can watch any Adrian Cockcroft talk about his experience at Netflix to be sure this is not only a BuzzWord (#NotOnlyABuzzWord) but, is it able to be use on all cases?

When we usually watch a talk about microservices is always the same story: microservices vs monolith application, especially web applications following some kind of client — server pattern or even a MVC pattern or something similar. And for that kind of applications is great, simple and clear.

Microservices in SOA Enterprise Environments: The Real Challenge

But what about enterprise applications where we’ve been following a SOA approach for decades: Is microservices applicable here?

For sure there are a lot of differences between Microservice Approach (the pure one, the one that Martin Fowler used in his article) and the SOA Paradigm. They don’t share the same principles but at the end they are closer than the usual contestants you see in all the talks (monolith webapp vs microservices)

Microservices talks about breaking the monolith and that’s easy for a web application, but what about an SOA Architecture? In this case is not even possible to go down that path.

If you’ve ever worked in enterprise integration, you’ve seen legacy silos that are mandatory to keep untouched. These enterprise systems often cannot be decomposed into microservices. It is something not open to discuss.

They existed different reasons for that decision: It could be because they are so legacy no one knows about them, about how they do what they do, or could be because they are so critical no-one is going to down-path or only because they are not business-case to justify to replace this kind of silos.

Hybrid Approach: Combining Microservices Benefits with SOA Architecture

So what now? Can we adopt microservices or should we stick with the SOA approach?

Microservices is not only about breaking the silos but is something very important, so no, we can not go the Microservices path for Enterprise Integrations, but we can gather all the other advantages the Microservices includes and try to applying it to our integration layer (now, we wouldn’t be talking about SOA Architecture because most of this advantages are against some of the SOA principles)

Microservices Wave is also about Agile & DevOps, about to be faster, to be automated, to be better, to reduce your time to market. It is about cloud (not in the term or public cloud but in the term that not be tied to your infrastructure). It is all about that things too.

So, Microservices are about so many things that we could apply even if we couldn’t go 100% over this. There are several names to this approach like Service-Based Architecture, but I’d like much more the micro-services approach (with dash in between, talking about services that are micro) because I think it explains better the approach.

So, we’d like to do smaller services to be more flexible, to be able to apply all this Devops things, and there we can apply all the other things related to the Microservices Wave.

And that’s not something new, that’s not something that is starting now or in the last years.

It is something that I’ve been seen since the beginning in my career (a decade ago) when I’ve started working with TIBCO AMX BusinessWorks that gives you the chance to decide yourself the scope of your services and depending on the needs to could create “Enterprise Applications” or you could go for “Enterprise Services” or “Small Services” that worked together to do the job.

And that path has been followed not only by TIBCO but some other companies as well, with the evolution of the ESB concept to be adapted for the new era, that were more like PaaS where allowed you to run your services in a “some-kind” of containerized world.

For example, TIBCO AMX Platform (from 2006) you could develop your services and applications using several kind of languages and options like the Graphical Editor for Mediations or Java, C/C++, .NET, Spring and so on using SCA standard and running on a elastic OSGI-based platform where you can manage all of them in the same way (sounds similar, right? 🙂 )

Service Reusability: SOA vs Microservices Trade-offs

What about service reusability? The SOA paradigm has very high standards to ensure service reuse and entreprise registry and repository… and microservice is (at the beggining) against reuse, you should duplicate instead of reusing to be able to be self-contained and more free. But, the latest advances on Microservices includes an Orchestration layer, things like Conductor that are going the path of reusing and orchestration. So, we can find a middle place, when you need to reuse if possible but not stop your agility to ensure 100% reuse of the chances available. Time to market is the critical driver here for everyone now, and all the “principles” have to adapt to that.

What about DevOps and Cloud? No problem, here you could include the same techniques for this case like you were doing previously. Infrastructure as Code, Contianers, Continuous Integration & Deployment and so on.

What about agile standards REST/JSON and so on? No problems here as well.

In summary, you can adopt and implement most of the flavors and components of the Microservices movement, but you need to compromise on others as well, and you are not going to be used “pure” Microservices, you are going to use another thing, and that’s not bad. You always have to adapt any paradigm for your specific use case.