Saturday, July 3, 2010

How JBoss Netty helped me build a highly scalable HTTP communication server for GPRS

It has been nearly 1 year and 3 months since I discovered JBoss Netty. We were building an end-to-end bus transportation solution and the server-side required a highly scalable
communication layer which could support concurrent transportation devices to the tune of 10000s and with a throughput of above 500 requests/sec posting transportation related data to the server over GPRS using HTTP.

In the first version of this solution wherein I was the architect of the communication server, I had already burnt my fingers using a commercial application server's servlet container and naively building upon a synchronous architecture which could not scale upto more than 500 devices at nearly 25 requests/sec! Of course, a lot more was wrong with it than just using a servlet container - the entire architecture was synchronous and there were other humungous mistakes, but that's another story.

This time we were making a radically different architecture under the guidance of a seasoned architect. As I had built the communication server for the first generation of the solution and had already learnt a lot from our experiences of making the solution live, I was determined to get it right this time. In keeping with the overall architecture of the solution and for supporting this high scalability and throughput, I decided to use an asynchronous HTTP engine based on NIO.
Why NIO? Well my fascination for NIO started in early 2006 when I came across the following articles published in O'Reilly

However, my project manager-cum-architect at the time (early 2006) was perturbed with the idea of trying out a 'new' and 'untested' architectural approach and as I had not much architecture experience at the time to counter the arguments, I gave in and used a regular Core-J2EE-patterns inspired 3-tier architecture utilizing a commercial application server's servlet container.

When the first generation of the complete solution was built, deployed and made live, I became more and more convinced that my first inclination to use NIO was justified.

Here were the reasons-

  • I was building a communication server which used HTTP as a means of transporting the actual application data. I was not putting up a web site for human users with an HTML GUI. As such I did not need lots of the facilities of the servlet containers and was in fact hampered by a few restrictions as I will elaborate below.
  • Pre-Servlet3.0 specifications only supports synchronous architecture and a thread-per-connection or thread-per-request model. Servlet3.0 specification has only recently been released and the majority of application servers are yet to comply with this specification. For more information, read the article Asynchronous processing support in Servlet 3.0. Our solution was built and deployed much before that. Of course, application servers also have non-servlet-standard APIs for asynchronous HTTP in their servlet engines (read article Asynchronous HTTP and Comet architectures). But I had no awareness of it at the time.
  • The transportation medium was GPRS for which bandwidth is quite low. Moreover, in developing countries, the quality of GPRS is not as consistent as Europe or United States. Furthermore, our devices were mounted on moving vehicles which could move at very high speeds changing cells and GPRS connection could be quite fragile. Under the hood, GPRS is a packet oriented service offered on 2G networks and as such manages to send only a few packets so long as communication channels are not used up by voice communications. That means the entire HTTP request would arrive in packets or chunks or bursts over GPRS. Often we would also get HTTP requests that were incomplete and would eventually time out. For example, if the GPRS device was trying to send 512 bytes our server would get only 78 bytes and then the request would time out. In the thread-per-request model, this would mean that one precious thread of the container would be blocked just trying to read the complete request and would eventually timeout. Depending on the read timeout value set in the application server, this could mean as much as 30 secs which is a huge amount of time. What we required was an engine which could do a non-blocking read and assemble the entire HTTP POST request in memory and only then call a servlet or a handler to do the rest of the processing. In other words, we needed the proactor pattern instead of the reactor pattern. And while lots of application servers use NIO or native IO, they internally use the reactor pattern!

The case for NIO was thus clear. But I still was having a hangover of the Core J2EE patterns and the numerous MVC frameworks built around Servlets for presentation tier. So I would probably have used NIO frameworks in the typical synchronous style. Then along came our seasoned architect who reviewed our solution and proposed a radically different - distributed and asynchronous architecture. I realized that if I used a synchronous approach with the reliability that data sent by the clients was getting successfully processed by the backend, I would have to block my thread till the backend processing of the data sent by the vehicular devices was done.

The other possibility would have been to use reliable JMS, but with the store-and-forward technique of reliable JMS, we would have probably sacrificed the desired high scalability and throughput for the reliability. Besides that was rather like arm-twisting the architecture to comply with servlet container constraints.

What I needed was an HTTP engine which would not impose the following restriction of pre-Servlet-3.0-

Each request object is valid only within the scope of a servlet’s service method, or within the scope of a filter’s doFilter method. Containers commonly recycle request objects in order to avoid the performance overhead of request object creation. The developer must be aware that maintaining references to request objects outside the scope described above may lead to non-deterministic behavior.

An engine that would allow me to do this-

That way, there would be no scalability bottleneck in accepting more requests from larger number of clients and reliability would have been sufficiently achieved as well.

Moreover I needed this in late 2008-early 2009 when the final Servlet 3.0 specification was not yet released and there was no reference architecture either.
Relying on my NIO experience, I proceeded to evaluate the following three options of NIO frameworks-

I had had experience building NIO TCP/IP servers with Apache MINA but the MINA releases were taking a long time and HTTP protocol was not directly supported. Yes, there had been an asyncweb project from safehaus built on MINA0.8, but the project had been migrated to Apache and was not developed further to be compatible with latest versions of MINA. It was a big risk using Apache MINA+Asyncweb.

Grizzly, I found, had too many options - plain NIO framework, Embeddable HTTP Web Server, Embeddable Comet WebServer, Embeddable Servlet container. I was simply baffled. It seemed a great learning curve and I had very little time. Moreover, this post from Grizzly's creator alerted me that Grizzly would by default, use the reactor pattern rather than proactor, which I was not comfortable with for my scenario.

In contrast, JBoss Netty's HTTP support was short and sweet - with no-frills. All I had to do was to use the ready-made HTTP Protocol encoder and decoder and attach my own HTTP request handler. After all, HTTP for my scenario was just a protocol over TCP/IP sockets carrying the application data in its body.

Within no time, I had my prototype ready and I was confident that I was on the right track. The performance test reports were extremely heartening. But could Netty fit in my scenario?

Turns out that it far exceeded expectations. That simple HTTP communication server I built using Netty could sustain the concurrent load of 10000 clients with the desired throughput of 500 requests/sec on a single desktop PC with 1 GB RAM and 1 CPU in our lab. And I am not kidding! It was celebration time for all of us. The memory utilization was well within the 812MB heap size we had granted to the JVM and the average CPU utilization was 78%. In other words, we were utilizing the full resources that the machine offered.

My reasoning for building a scalable HTTP communication for GPRS devices with NIO, asynchronous HTTP was thus vindicated by using JBoss Netty Framework. Thanks to Trustin Lee for making this splendid framework.


  1. To the point and technically very informative!!

  2. Great article and thanks for sharing. Recently, I also questioned (and am still questioning) the Servlet container-based approach when I needed a custom server/http web server and proxy. I evaluated MINA, Netty and xLightWeb. I ended up choosing xLightWeb because of its extensive support for HTTP.

  3. Just curious because I also encounter similar situation.

    Does the system base on window? Doc e.g. says that linux/ unix does not support socket asynchronous io.

  4. It was sufficient for me that the IO was non-blocking if not completely AIO. But yes, the system was on windows.

  5. Hi Archanaa,
    It is a Great article and thank you for sharing it.

  6. Great Article. Would like to see how the services were deployed. App Servers give some tools to allow monitoring and such. So, any suggestions on deployment aspects would be great help.

  7. Nice article. I wish if I could find some basic documents about Netty. I am not a pro java developer and finding it little hard to grasp whats Channel, Buffer etc. Trying to learn, but both MINA and Netty document assumes that you know about these terms and NIO already.

  8. Regarding deployment - the custom engine that I made with Netty was in plain Java. For monitoring, it was simple enough to maintain my own statistics like number of current connections, details of the connections etc. A little more work, but it pays because one is able to maintain and collate exactly all the information one needs per the application. These statistics can be exposed with a simple Swing GUI or exposed using standard JMX connectors that come with the JDK or application server runtimes.

    Regarding NIO primer, there are several good articles that clear concepts of NIO but my favortites are the ones whose links I have given in the article above - Introducing Non blocking sockets and Building Highly Scalable Servers with Java NIO.

  9. Thanks that was a great article!

  10. Good reading, thanks.
    But there is nothing scalable in this article.

    1. Rather silly to post non-constructive criticism methinks. The writer has at least shared what he's done. Why not be brave and post under your own name geezer?

  11. Sorry , that should read what she's done.

  12. Because of the decoupling nature, IMO, JMS is more scalable than NIO-only from architecture point of view. However, if physical scalability is available (i.e. more boxes are available), then NIO-only would be fine too. If only the CPU's are not wasted, the result will be the same.

  13. Doesnt netty also use Reactor pattern as does MINA ?

  14. Hi, my definition of Reactor and Proactor may be different than what academically it is defined as, in which case - please do feel free to correct me :-)
    The article 'Do we really need Servlet Containers always? Part 2' ( ) details how I view reactor and proactor especially for HTTP.

    In short, my definition of proactor (or at least pseudo-proactor) for HTTP is that - the container or HTTP software API layer reads the complete request into (user-defined) buffer and then calls the application handler to handle it.

    Whereas the reactor one is that - the container or HTTP software API layer calls the application handler when the first few bytes of the request arrive and then it is the job of the application handler to synchronously (or blockingly) read the complete HTTP request from the socket.

    In case of GPRS, if the first few bytes of the request arrive, it doesn't mean that the rest of the request will follow smoothly and immediately - there can be pauses due to the nature of GPRS. So in the proactor/pseudo-proactor, the HTTP layer would keep reading and concatenating whatever bytes it gets into a buffer in a non-blocking manner. When it sees that an HTTP request has been fully read (by virtue of content-length or other means), only then it gives the complete request to the application layer to handle in a buffer. Thus, there is minimal blocking.

    Of course, at the expense of some memory consumption while the request is being concatenated and assembled.

    1. Just to add to that, most JavaEE servlet containers (of various application servers) seemed to do it in the reactor fashion - that is, instead of assembling complete request in-memory, they call the servlet when the first hint of a new request comes and then it is the job of the application's servlet to read the request bytes from the actual socket. At least that was the case in late 2008. I don't know if newer versions of servlet containers of various application servers give a configuration parameter to allow us to specify whether we want the HTTP request to be pre-read before the servlet is called or not.

    2. Hello,
      i am trying to build an application for the very same purpose you did, but i dont understand netty that much, i would like to know if you could give some documentation regarding pools, work groups cause i don't get how they work. Or if you could tell me how to start, i cant find any tutorials or anything that would explain them. Thanks a lot in advance my mail is

  15. hi, just to clarify that , if developed with asynch i/o NIO2 GPS server, it will not run on linux/unix as expected result. Only work on windows server. If so than how can achive the same performance on non window serves.

  16. Really nice article. Very informative.