the way I saw it: platform

Showing posts with label platform. Show all posts

Friday, July 19, 2013

providers, providers, providers.. do not kill me, help me!

when you have a business, it is important to be careful with every aspect, but there are two that are really important to me:

People that you are hiring
Services providers

I use that order because i believe that it is the correct, at the end an organization (company, enterprise or however you want to call it) is a structure with people giving meaning to that structure and the most important element IS THE PEOPLE.

the second point is related to service providers, that is because many of the business we are building today rely on services that we are hiring. History teach us that some providers could fail, for example having a blog in wordpress.com, maybe they could suffer a outage and take our blog down, that is bad but imagine having your core business app hosted in a place that goes down (in that point you could start crying). If you do not want to cry, what you will need are really reliable providers.

talking about providers, we expect help from them, having a cleaning company that leaves more trash than the one that is collected is not a good deal. Let's face it, we need more help than problems, that my friends is a good indicator of the quality of the provider (problems over help, it have to be almost zero).

Being very honest, problems happen, and we will have to deal with them, but what we do not need (want) is the same problems over and over.

I will give you two examples:

AWS, they are the first provider of cloud computing services, they have (in my personal opinion) one of the most amazing stacks of cloud oriented services. Obviously, they will charge you for breathing if they have a chance. They have premium support that could be something really expensive, for my it could be like USD 2000 monthly (OUCH). If you are not a member of the premium group then you are screwed!! You can open a ticket and pray for a solution in less 2-3 days or post something in the AWS forum and again pray for a solution. For example, they showed me in the AWS Management Console that I have a pending "EVENT", it was an Instance that was running in a "DEGRADED HARDWARE", one of the AWS solutions is Stop/Start the instance, well, i did that... the instance was stuck in stopping state for 5 hours. Thankfully it was not a really important server, otherwise that would mean a downtime of 5 hours, obviously, i reported the issue after 5 minutes but the solution arrived a "little bit" later.
Time Warner Cable (TWC), do i need to say more?? Anyway, I will do it.. Imagine you have an Internet based product, so you will need a reliable connection, well, TWC offers connections of 30Mbps/3Mbps, 30Mbps/5Mbps, 50Mbps/3Mbps and 50Mbps/5Mbps (something like that). Usually, if you have the 50% of that, you are really lucky, and the trace routes could vary from 20ms per hop to 1200ms per hop in a five minutes period. By the way, if you call to support, they will give you reasons like "it is related to solar flares (really, they told me that once and my connection is coaxial)", "it could be a problem with the AMPLIFIERS (OMG, do i have to plug a headphone to the router??, where are the speakers)". Whatever they say the final sentences is the same.. "It should be fixed in a while, wait for a couple of hours", in real life, that means something like "stop bothering us because we are busy playing solitaire".

Did those examples sound like reliable providers??

I can give you some personal keys:

try to build your apps over disposable servers, so you could get over any kind of situation like the "DEGRADED HARDWARE" case. Volatile is key!
If you could not make it volatile, make it easy to recover, maybe with amazon AMI or any kind of machine images, backups with fast recovery and fast deploys. Chef is awesome, but having a recipe that takes 1 hour to run is a really BAD option, speed it up.
get redundancy, do not put all the eggs in one basket. For example, if you are in AWS, use multiple regions and availability zones in each region.
if you could combine multiple providers, DO IT!!

Wednesday, July 10, 2013

orchestration is one step for growing

When you are taking care of a platform, sometimes you need to do some tasks on a set of servers and everything will go like a charm if you have a few servers, but imagine that your platform grows from 10 servers to 200 servers, well, that could make your job a little bit awful.

Even if you have an image of the server, every time that you need a change you can create a new image but you will need to replace all the servers that are running or coordinate an update, obviously you could not do it manually, so you need any kind of orchestration. Orchestration refers to management and coordination of tasks in big systems.

At this point you could find many people suggesting tools, I would like to give an opinion that by the way is not final because I am still testing. As usual i will give you choices, so you could walk whatever YOU think is the right path.

Let's start with Puppet, this tool is awesome you can make almost a lot of everything, monitoring and orchestrating, but let's be honest, it sounds like too much, and it is.. probably, it will awesome but it will be like cooking an steak in a nuclear reactor. Anyway, you can give a chance to it.

Second tool, Chef, it is really amazing how can automate configuration, installation and keep track of configuration.. Actually, I am trying to fall in love with this but our relationship is complicated, you need to be very pragmatic, practical and organized. When you use a tool like that, you need to know when start doing in your own way, it is better to keep the recipes in the original way but any real life chef know that they can twist a recipe to make it taste like they want.

My last option, do it your self!! For example, i have built a set of scripts in python+fabric that allow me to do many stuff against a set of servers in AWS, for example I can define common tasks, like clean of logs, and run it against all the webservers, or I can open a virtual console that give access to send custom commands to a set of servers even in different regions. Actually, i love fabric, i think that framework is AWESOME.

You need to test every option by yourself and take in count the size and characteristic of your business, there is no map of the treasure, there are clues and you have to build your own map.

Enjoy the ride!

Monday, July 8, 2013

the nightmare of shared passwords

Today, my title is clear, every time that someone joins the team of operations (call us devops, sysadmin, etc) we need to start sharing passwords for different services and in that point we start to listening the glove armed with razors scratching a pipe (just like Nightmare on Elm Street), and now I will scare you to death, imagine that after months giving passwords for services, one of the members of the team leave the company, OMG!! kill me now Mr. Krueger!!

Well, I have some Password Managers that could give us clues and help us, but keep in mind that i am not 100% comfortable with any of them, I would like something more custom adapted to my concepts of safety, simplicity and usability.

We have three choice:

Hosted services, and by hosted I am talking about a service that someone provides (here my spidey sense start to alert me about someone else having my passwords)
Standalone apps, well nobody have my passwords, I could share the password db but keeping the synchronization is really sucky
Mixed environment, now we are talking, this sounds more like a good choice.

Some of the tools available includes:

Some of them hosted like CommonKey, some standalone like KeepassX and some of them like 1password that can be setup to store the DB in Dropbox.

This is my wish list of functionalities:

Privately hosted
Clients for multiple platforms (mac, linux, windows, android, ios)
Secure communication between clients and server
Access control
Easy synchronization
Easy modification of information stored in the DB
Secure storage of the information

At this point, my choice is a kind of hybrid solution manually built on top of Dropbox and using 1password as client, at the end the idea is to be able to revoke access or grant it as easy as using Dropbox. The bad part is that you need to pay for 1password and you depend on Dropbox.

As an extra point let me tell you that keeping the password in that way could give you an amazing option when someone leave the organization because you have a list of the password to change and a direct way to share the new ones.

I hope that this post give you more ideas than solutions and if you decide to solve this problem, count with me as developer and tester.

Tuesday, July 2, 2013

keeping an eye in the platform

Sometimes we feel like our platform is not doing well, maybe we see slowness in the service. Well the answer is lies in questions like:

are you ready for growing??
what is the performance of your platform??
do you know where are the bottlenecks??

Well, in order to discover those answers, you must need an eye on your platform (maybe more than one), today it could be funny and simple, but especially very powerful because the amazing amount of tools available.

If you are using any cloud provider you will enjoy some basic metrics, but the power is in another place, i will give you some options and as usual you will have the task to choose whatever you like and makes you feel comfortable.

nagios, basically a nice alerting system, mainly use snmp to get metrics from servers and show you the status per hosts/service (ok, warning, critical).
cacti, it is just graphing, based in snmp you could monitor metrics.
munin, helpful to collect data because it handle a custom client/server architecture (munin-client and munin-server), this is not a graphing tool, it is for monitoring.
collectd, i am starting to feel love for this client/server option (like munin).
graphite, nice graphing server with many frontends available.
ganglia, data collecting system (client/server) with a really ugly frontend, but pretty fast.

This is a important point, YOU CAN COMBINE SOME OF THEM!!! Yes, you could use collectd+graphite+nagios. Anyway, as usual you should take care of scalability in two points: 1.- do not overload your platform and 2.- your monitoring platform must be ready to scale with you.

Please, monitor and enjoy understanding where are your bottlenecks and your performance opportunities. Remember that it is required collect and process data in order to have elements to analyze and make desicions