Thursday, October 31, 2013

a monitoring story with mongo MMS at Pixable.

A few hours ago I was talking to my Boss about the benefits of working with monitoring tools developed by the same developers of the relying tools, the best example we found was Mongodb and the MMS (MongoDB Management Service), we decide to make a short tale about that and we want to share it with you:


"Every night I lived in fear. Sleeping was difficult, but not because of nightmares, it was because of alerts on my cellphone saying that our API queues were growing, response times spiking and everything was slowly falling apart.

The problem did not occur at the same time each day, which made it more difficult to debug. 
Finally our super duper architect installed MMS. The second day we used MMS I saw the light at the end of the tunnel. There was a clear spike on the page faults indicator at exactly 3:00am. Mongo was doing everything possible to keep working but it eventually failed minutes or hours later, that is why we were never getting the alert at the same time.

So, easy no? It's a cron job. So we went to our code, look for all cron jobs running at that time and we found one that used to loop trough every single user on the system, but that query had the read preference of secondary only. So what was it?

After digging on what the cron was doing, we found, for each user, it was doing a very very simple query, hitting random places of another collection, and this was making mongo to page fault A LOT. 

Eventually the paging removed all signs of hot data from ram on the Primary and the cluster would become a slow wagon. 

We switched that query to run on secondaries, and now we are a happy family again. I sleep like a baby now."


Our final message is that you have to work with the tools that the providers give you in order to detect the problems or opportunities to improve. Do not get me wrong, as any tool it could be improved (knowing the mongodb people, I am sure it will). We love mongodb and we believe in the MMS capabilities.

Check it out at http://mms.mongodb.com

Thanks to Julio Viera for working together with me in the development of this post.

Tuesday, August 13, 2013

being "agile" does not means being "scatterbrained"

I am not (and I do not attempt to be) a Scrum Master or any kind of agile guru, for that I have friends like Jesús Enrique Méndez (http://agileinpills.wordpress.com and his initiative http://www.agileopenspace.org), Gustavo Bonalde (http://gbonalde.blogspot.com/) or Carlos Gabriel (https://twitter.com/CarlosGabriel_). Anyway I am a guy that has spent a lot of time researching, studying, watching in silence how people do the stuff and secretly making some experiments; so I have developed my own opinions.

First, I believe that we need to be really agiles in our process, any kind of waste should be minimized, so why should we keep doing things in the most costly way (time related)?

Second, we need to be productive, that means that expending a year to develop a new functionality is unacceptable, even if you are developing an Operating System the process should not be that long.

Third, your business model or your economical maturity level could give you an advantage over other companies but that does not mean that you are allowed to make mistakes and if you are a small company or startup you will need to use properly any piece of resource that you can get.

Thinking in those three ideas, I usually see how people start being "agile" by ignoring any kind of design or worst, they start being "agile" and they stop thinking in the problem or the possible solutions and they start doing "anything" to solve the problem. I do not want to talk about efficiency or any related aspect to that, but let's take a time to understand the situation, let's draw something, let's get to the reasons of our problem and then we can solve it looking an good/optimal solution. If you want to be "agile" you can start by using the best simple solution instead of the most complicated solution. At this point my two advices are:

remember that thinking and designing are not sins.

keep it simple, do not oversize the problem.


I also see how people try to reduce the development cycle by doing something and sending it to the production environment as soon as they finish, then everybody gets crazy because something is falling; well, when I was a young developer, I learned about a technique (almost magical) to minimize troubles, it is called TESTING, we can not be in a rush and forget about tests, we HAVE to test our codes, obviously something can change and something could fail, but that HAS to be the exception and not the rule. I have saw people that is unable to send a functionality to production without having tons of errors and then breaking everything. Today, we have tools to automate our test, we can use jenkins or any other but we have to use it.

clumsy is not agile.

lazy is not agile.

testing is important for everybody, and we know that things can fail but it should be the exceptions.  

you can be agile testing, that does not mean NO TESTING.

Finally, I have saw people that as soon as they start getting error they start blaming any single API, third party tools or even the astrological configuration of mercury. People, let's face it, usually when something goes wrong we are the responsible, if we work in an orderly way, if we document our code, splitting the problems in smaller problems could make us agiles replacing broken pieces, fixing bugs or improving functionalities.

Well my friends, if you are trying to become an agile developer then you have my bless, just do it, but do it in the right way, do not become a "scatterbrained" trying to speed up developments.


Thursday, August 1, 2013

chromecast is here!!

The last weekend i got my brand new Google Chromecast, First, it was a miracle because i got it just three days after the public release, THANKS AMAZON.COM!!! you are such a good friend!!

I will tell you my story with the device that made me feel so excited that I took pictures of the process.


The package as usual for google products was something clean, small and simple, basically nothing fancy (btw, i like that).









Then i got my first encounter with the device, i have to say that i bought it without watching or touching the device, so it was a mystery for me, but i was gratefully surprised. it is small enough to make it very portable, so you could take it with you anywhere you go.








Using it is really simple, plug it in any standard HDMI port, attention HDMI do not provide power to plugged devices, because of that you will need to power the Chromecast; do not worry about that because the devices comes with a USB cable + power adapter, so you could get the power from the USB port of the TV or any power outlet.








Now, you need to setup the device, it is extremely easy to do it, First go to https://cast.google.com/chromecast/setup and click in the download button, the magic will start there.



After installing the driver, the setup will detect your wireless connection and it will ask you for a name for the device.


And after doing that, you will be prompted to install the Google Chrome Extension (DO IT).


You know what, that was it!! Your Google Chromecast should be ready to use it!!! HOORRAH!!!



So, after playing a lot with the device. the good is:

  1. Extremely portable device
  2. Extremely easy to setup
  3. It is multi platform, this really works with apple devices, android devices, laptops, tablets, etc. As a friendly reminder the Apple TV just works with Apple devices.
  4. Native apps are the ones that get connected to the device, so you do not need to purchase something special to use a new phone or tablet, just get the youtube app, chrome app, etc. As a friendly, reminder roku could become messy and expensive in this point.
  5. It is open, YES IT IS OPEN, so if you want to develop an app to stream from Frostwire to the TV, then you need to use the API, so any content provider or content aggregator could build their owns apps or any third party could do it. Do i have to said something about Apple TV?? well, i will... YOU CAN NOT DO IT, you have to wait for updates.. Even roku allows you to "create channels".
  6. It is possible to broadcast in HD, with the Apple TV it was frustrating that Air Playing a CBS video stops every 5 seconds and every single document make reference to "network capabilities", COME ON!!! i have a linksys wireless router, i am not using any generic homemade solutions..
  7. The device includes everything you need to use it, you do not need to purchase any extra stuff. For a limited time it was including a 3 months free Netflix subscription, that means to me that the device cost was USD 5 instead of USD 35 because i will save around USD 9 monthly.
  8. Did i say that it is cheap and works??
Not everything is perfect, so i will tell you what i think that is bad:
  1. the temperature is not exactly low, but well, it works with electricity and i am sure they know it is something to improve and it is not critical.
  2. If you are using a really old device like a laptop with a 802.11 a/b card from 6 years ago, it could be slow and you will need to reduce the resolution.
In conclusion, we are talking about a killing device, its openness create a path for the creativity of any developer that could imagine even a Raspberry Pi running android and connected to this device, this is lovely.. Finally, i had to do to more things, 1.- Remove the Roku 3 from my "toys" wish-list in amazon.com and 2.- looking for someone that would like to buy my old Apple TV.

I really love my new toy!!





Friday, July 19, 2013

providers, providers, providers.. do not kill me, help me!

when you have a business, it is important to be careful with every aspect, but there are two that are really important to me:

  1. People that you are hiring
  2. Services providers
I use that order because i believe that it is the correct, at the end an organization (company, enterprise or however you want to call it) is a structure with people giving meaning to that structure and the most important element IS THE PEOPLE. 


the second point is related to service providers, that is because many of the business we are building today rely on services that we are hiring. History teach us that some providers could fail, for example having a blog in wordpress.com, maybe they could suffer a outage and take our blog down, that is bad but imagine having your core business app hosted in a place that goes down (in that point you could start crying). If you do not want to cry, what you will need are really reliable providers.

talking about providers, we expect help from them, having a cleaning company that leaves more trash than the one that is collected is not a good deal. Let's face it, we need more help than problems, that my friends is a good indicator of the quality of the provider (problems over help, it have to be almost zero).

Being very honest, problems happen, and we will have to deal with them, but what we do not need (want) is the same problems over and over.

I will give you two examples:

  • AWS, they are the first provider of cloud computing services, they have (in my personal opinion) one of the most amazing stacks of cloud oriented services. Obviously, they will charge you for breathing if they have a chance. They have premium support that could be something really expensive, for my it could be like USD 2000 monthly (OUCH). If you are not a member of the premium group then you are screwed!! You can open a ticket and pray for a solution in less 2-3 days or post something in the AWS forum and again pray for a solution. For example, they showed me in the AWS Management Console that I have a pending "EVENT", it was an Instance that was running in a "DEGRADED HARDWARE", one of the AWS solutions is Stop/Start the instance, well, i did that... the instance was stuck in stopping state for 5 hours. Thankfully it was not a really important server, otherwise that would mean a downtime of 5 hours, obviously, i reported the issue after 5 minutes but the solution arrived a "little bit" later. 
  • Time Warner Cable (TWC), do i need to say more?? Anyway, I will do it.. Imagine you have an Internet based product, so you will need a reliable connection, well, TWC offers connections of 30Mbps/3Mbps, 30Mbps/5Mbps, 50Mbps/3Mbps and 50Mbps/5Mbps (something like that). Usually, if you have the 50% of that, you are really lucky, and the trace routes could vary from 20ms per hop to 1200ms per hop in a five minutes period. By the way, if you call to support, they will give you reasons like "it is related to solar flares (really, they told me that once and my connection is coaxial)", "it could be a problem with the AMPLIFIERS (OMG, do i have to plug a headphone to the router??, where are the speakers)". Whatever they say the final sentences is the same.. "It should be fixed in a while, wait for a couple of hours", in real life, that means something like "stop bothering us because we are busy playing solitaire".
Did those examples sound like reliable providers??

I can give you some personal keys:
  • try to build your apps over disposable servers, so you could get over any kind of situation like the "DEGRADED HARDWARE" case. Volatile is key!
  • If you could not make it volatile, make it easy to recover, maybe with amazon AMI or any kind of machine images, backups with fast recovery and fast deploys. Chef is awesome, but having a recipe that takes 1 hour to run is a really BAD option, speed it up.
  • get redundancy, do not put all the eggs in one basket. For example, if you are in AWS, use multiple regions and availability zones in each region.
  • if you could combine multiple providers, DO IT!!



Thursday, July 11, 2013

get a backup before it's too late

During the last years we have learned about taking care of our code doing backups in zip files, then we evolved and we start using SVN to control changes and having a backup; today, we are majorly using GIT for version control and having the backups in a system providing much more functionalities around the main concept of a BACKUP.

That is really cool, if you are doing version control then you are doing something good. But, if for example you are running a website; well, you will have you a backup of your code in a repository and if you are using any Content Delivery Network (CDN) then you will have a backup of the files that could be sharing, but what happen with your database?

Well, obviously you will need a backup of your database, but usually as soon as you start growing then your databases start being bigger and bigger, really quickly you will be unable to do a dump of your DB (damn you scalability). Ok,  now I would like to make a list of some considerations prior to deploy a backup system:

  1. we should have downtime zero!! basically, the system operation should not be affected for any backup.
  2. Optimal use of the storage resources, if the database grows 100% weekly then you will face a issue having tons of megabytes of storage for backups. Having incremental backups sounds good.
  3. Backups should be done often, it is useless to have backups every month if the business change every day or every hour.
  4. Good use of the processing time, if you need to do 4 backups a day, and each backup takes 12 hours to be done then something is really wrong.
  5. Having the optimal system is irrelevant if you are not able to restore from a backup
  6. If you have incremental backups then you could have the option to restore to a point in time (maybe).
Taking in count those points, I would like to give you to options based on the systems that I am managing at this time.

First, I love using mongodb, the creator of mongodb have an amazing set of tools available for everyone, those tools are called MMS, 10gen started offering a backup service really good, I have tested it and I have had really good results, it is 100% recommendable.

Second, I am also a MySQL user, I have systems running in master/slave mode and I even have cyclic replication between two masters (this sounds really nice), so it is complex and it is BIG, some time ago i was testing a tool from Percona called xtrabackup, I started testing that tool under a disaster recovery scenario, after that, i stop using it; not because of the tool, it was just lazyness but it is time to start testing again. This tools is amazing, it is like the backup system of the heaven.

The price of the backups is not related to the backup itself, it is related to the price that you will pay if you do not have a backup.

I have given you many places to look and read for, if you have any other option please let me know. And enjoy having the inner peace because you have backups!



Wednesday, July 10, 2013

orchestration is one step for growing

When you are taking care of a platform, sometimes you need to do some tasks on a set of servers and everything will go like a charm if you have a few servers, but imagine that your platform grows from 10 servers to 200 servers, well, that could make your job a little bit awful.

Even if you have an image of the server, every time that you need a change you can create a new image but you will need to replace all the servers that are running or coordinate an update, obviously you could not do it manually, so you need any kind of orchestration. Orchestration refers to management and coordination of tasks in big systems.

At this point you could find many people suggesting tools, I would like to give an opinion that by the way is not final because I am still testing. As usual i will give you choices, so you could walk whatever YOU think is the right path.

Let's start with Puppet, this tool is awesome you can make almost a lot of everything, monitoring and orchestrating, but let's be honest, it sounds like too much, and it is.. probably, it will awesome but it will be like cooking an steak in a nuclear reactor. Anyway, you can give a chance to it.

Second tool, Chef, it is really amazing how can automate configuration, installation and keep track of configuration.. Actually, I am trying to fall in love with this but our relationship is complicated, you need to be very pragmatic, practical and organized. When you use a tool like that, you need to know when start doing in your own way, it is better to keep the recipes in the original way but any real life chef know that they can twist a recipe to make it taste like they want. 

My last option, do it your self!! For example, i have built a set of scripts in python+fabric that allow me to do many stuff against a set of servers in AWS, for example I can define common tasks, like clean of logs, and run it against all the webservers, or I can open a virtual console that give access to send custom commands to a set of servers even in different regions. Actually, i love fabric, i think that framework is AWESOME.

You need to test every option by yourself and take in count the size and characteristic of your business, there is no map of the treasure, there are clues and you have to build your own map.

Enjoy the ride!


Tuesday, July 9, 2013

Simple, Standard and Open

Today, i was at Google NYC attending to the Meetup about the Crisis Response project, I learned a couple of stuff there that i would like to share with you, I also remembered some old concepts, let's start recognizing that some people in Google is using their knowledge and some cool tools to help people, if you want to know a little bit more about that check the google.org site. GOOD JOB PEOPLE!!

Then, at the beginning of the tech talk the presenter showed a slide with this 3 words:

SIMPLE

STANDARD

OPEN

Reading that, I came with the idea of sharing my opinion about those words.

Talking about SIMPLE, we usually start to building stuff, coding, developing and we walk around many options or solutions, well, I will tell you something, you should be trying the simplest one, you do not need to create a time machine to get the current system time... please KEEP IT SIMPLE!!

In second place, let's be standard, if you have a proven solution please use it, after having your product up and running you will have the chance to invent stuff just to replace those that could work better, but the secret is here.. USE THE STANDARDS, if the specification of a product says that you have to do X, then why are you trying to do 1/X, if the language is object oriented why are you trying to create a procedural code. Be STANDARD

Openness, do not refer only to open source products, it refers to be open to collaboration and be open to collaborate, if you have a problem ASK, if you found a solution PUBLISH it. Also remember that most of the time you are working with other people, so please be gentle and comment the code, create READMEs, etc.