Going for production and scalability

During last week’s CS3216 class, our very own SoC IT Architect Zit Seng gave us a really interesting and useful talk about the production system. He explained the gap between a system in development and a production system, how they are different, and what we should take care of.

This is quite useful as most of us don’t have the experience on developing large-scale software system. All our assignments and school projects usually only target for a quite small amount of user, for example, definitely less than hundred (including you and your teammates, your TA, and some juniors who are searching for the assignment solutions).

Here is a typical scenario when things go wrong:
Usually we developed our code on our localhost, and it works! Next we deployed to EC2 or digital ocean, typed the url in browser, and expect it works as well. It did work! And we just left our web app there and have the faith that it will also works for anyone else.
Then let’s suppose, somedays your app become super popular, and suddenly thousands of people are using your web app at the same time. Boom! You will wake up every morning with your mailbox full of users’ complains: “your app sucks and can’t be used at all”.

You are confused. You open your laptop and test your app on localhost, everything just works. You then carefully deployed your web app to the server, it can be opened as well. Then you comfort your users, “don’t worry, the app is back to work as a charm again now”.
Then the next day, your nightmare come back and your app just break again. You are totally lost and the story goes into a loop.

What could happen here?

Apparently, either your app, or your server, is not ready for production with large traffic. This is also what we say about “scalability”. Your system may work perfectly on your local machine, but it will fail when there are more than that.

Now, what you can do?

Firstly, try optimize your application.
For example, you build a data processing system, and you decide to slack a little bit and use bubble sort to sort the incoming data. If you use a quick sort, the operation may only take 1 second. With the poor bubble sort, the operation takes 10 seconds to sort all data. Now if you have 100,000 users and they are sorting their data at the same time. That’s 100,000 sec VS. 1,000,000sec, and your CPU may just can’t compute it for that long time and will die quickly and your system will crash.
Another example, which is also one of my recent experience. Recently I was busy helping a start-up building a real-time application. We used a socket-io server to keep our client updating with new real time data, and display them nicely in the web app. The app is implemented on time, everything is functioning well, and it seems quite promising. We dockerized our application, deployed on staging servers, and it is up and running. Everything looks really good, until we did a load testing. The server will significantly slow down after some number of users, and will crash quickly after that.

What happened?
Well, the idea is simple. Socket-io is a super handy real time library for updating connected clients with data. The server can push updated data to all connected clients whenever the data is updated. The data we sent to clients was around 50kb. Seems very small right? Only 50KB, less than a normal picture. But if you do the math, 50KB per users, with expected 100,000 users (this is an app to displaying LIVE DOTA2 game match data), the rate would be 50KB * 100000 / 1024KB/MB = 4882MB/s ~= 5GB/s. Your server will be sending out 5GB of data every seconds, and that is scary. Either the server’s bandwidth is saturated as soon as it could, and the application becomes not “real-time” at all, or your VPS provided will shut down your server and send you a warning letter for exceeding the allowed traffic. Nevertheless, we need to reduce the size of the data, or think of a better way to do it (in the end, we did change how we implement it, but that’s another story :D).

That’s just two example on how your application could be optimize to solve the scalability issue. But usually, it is just not enough. That’s bring us to the next part, which is also mentioned by Zit Seng, about scaling up your servers.

There are two kinds of scaling, horizontal and vertical scaling.
Vertical scaling is easy and rude - just adding more resources to your server. Your CPU not fast enough? Change to a better and more expensive one. Your don’t have enough memory? Here are 64GB more. You want larger storage? I gave you two 2TB SSD. In general, vertical scaling can improve the performance of your software on a single machine, which means maybe you have room for taking in more connections.
Horizontal scaling is more tricky and sophisticated. It is about connecting multiple computers together, and make them operate in clusters so they can handle larger computing tasks. In the lecture, Zit Seng showed us a picture of a cluster of 256 CPUs starting up together in SoC. You can now scale your web servers from one to three instances, and each of them can handle requests separately. You can also add a load balancer in front of your server, so none of your server will be overloaded. In this way, your app could take much more traffic than previously running on a single instance. However, you need to be more experienced with this solution as it requires quite a lot of configuration, and make sure your web app is developed as a distributed application in order to work well on it. (For example, Node.js has cluster module to assist you with multiple instances).

In conclusion, maybe production system is a thing that is far from us as a student, but we definitely interact with all kinds of production systems everyday, and know what is it like to use a failed production system. As a CS student, we should keep the scalability in mind while developing the application, and always appreciate what we can do to make the app work well even under large traffic.