- People that you are hiring
- Services providers
I use that order because i believe that it is the correct, at the end an organization (company, enterprise or however you want to call it) is a structure with people giving meaning to that structure and the most important element IS THE PEOPLE.
the second point is related to service providers, that is because many of the business we are building today rely on services that we are hiring. History teach us that some providers could fail, for example having a blog in wordpress.com, maybe they could suffer a outage and take our blog down, that is bad but imagine having your core business app hosted in a place that goes down (in that point you could start crying). If you do not want to cry, what you will need are really reliable providers.
talking about providers, we expect help from them, having a cleaning company that leaves more trash than the one that is collected is not a good deal. Let's face it, we need more help than problems, that my friends is a good indicator of the quality of the provider (problems over help, it have to be almost zero).
Being very honest, problems happen, and we will have to deal with them, but what we do not need (want) is the same problems over and over.
I will give you two examples:
- AWS, they are the first provider of cloud computing services, they have (in my personal opinion) one of the most amazing stacks of cloud oriented services. Obviously, they will charge you for breathing if they have a chance. They have premium support that could be something really expensive, for my it could be like USD 2000 monthly (OUCH). If you are not a member of the premium group then you are screwed!! You can open a ticket and pray for a solution in less 2-3 days or post something in the AWS forum and again pray for a solution. For example, they showed me in the AWS Management Console that I have a pending "EVENT", it was an Instance that was running in a "DEGRADED HARDWARE", one of the AWS solutions is Stop/Start the instance, well, i did that... the instance was stuck in stopping state for 5 hours. Thankfully it was not a really important server, otherwise that would mean a downtime of 5 hours, obviously, i reported the issue after 5 minutes but the solution arrived a "little bit" later.
- Time Warner Cable (TWC), do i need to say more?? Anyway, I will do it.. Imagine you have an Internet based product, so you will need a reliable connection, well, TWC offers connections of 30Mbps/3Mbps, 30Mbps/5Mbps, 50Mbps/3Mbps and 50Mbps/5Mbps (something like that). Usually, if you have the 50% of that, you are really lucky, and the trace routes could vary from 20ms per hop to 1200ms per hop in a five minutes period. By the way, if you call to support, they will give you reasons like "it is related to solar flares (really, they told me that once and my connection is coaxial)", "it could be a problem with the AMPLIFIERS (OMG, do i have to plug a headphone to the router??, where are the speakers)". Whatever they say the final sentences is the same.. "It should be fixed in a while, wait for a couple of hours", in real life, that means something like "stop bothering us because we are busy playing solitaire".
Did those examples sound like reliable providers??
I can give you some personal keys:
- try to build your apps over disposable servers, so you could get over any kind of situation like the "DEGRADED HARDWARE" case. Volatile is key!
- If you could not make it volatile, make it easy to recover, maybe with amazon AMI or any kind of machine images, backups with fast recovery and fast deploys. Chef is awesome, but having a recipe that takes 1 hour to run is a really BAD option, speed it up.
- get redundancy, do not put all the eggs in one basket. For example, if you are in AWS, use multiple regions and availability zones in each region.
- if you could combine multiple providers, DO IT!!