Saturday, July 10, 2010

Do we really need Servlet Containers always? Part 2

Communications and Multithreading Support

Application servers/Servlet containers do all the hard work of opening up server sockets to accept client connections and decoding and encoding HTTP requests and responses. But the internal multi-threading and communication strategy used and thereby the performance and scalability of servers may widely vary in this respect. What’s more, you are left to the mercy of the container and if one doesn’t know the internal details or read the fine print, one may choose a sub-optimal container for specific scenarios.
  • For example, earlier versions of some containers used a thread pool and allocated threads-per-client socket connection and recycled used threads once connections closed. Of course, this means that there is a cap to the number of clients that can be supported with this approach since after a few hundred threads, there can be severe context switching. This model has been abandoned for good reason in newer versions of the same containers. Most application servers and containers now provide the thread-per-request model where each HTTP request may be read and processed by a thread from a thread-pool and after processing of the request, the thread is recycled back. But this goes to show that not all application servers / containers are made equal.
  • The mechanism used by the server to read the request as part of the socket communication may also vary. Servers can adopt Native IO or Java NIO mechanisms to implement either the reactor pattern or the proactor pattern. The proactor pattern is more sophisticated, scalable and efficient than its reactor counterpart. True proactor implementations require extensive OS support and Native IO. Using Java NIO’s non-blocking read, a near-Proactor implementation can be achieved.
With respect to HTTP, a reactor implementation of a container/server is one where the following happens for reading requests-

In this case, whenever a client socket channel is opened at the server, a read-interest is registered at an event de-multiplexer. The event de-multiplexer thus similarly monitors read-interests for all the client socket channels. Whenever a read event occurs in any channel, it temporarily de-registers the read interest on the corresponding channel and sends a read event notification with the reference of that channel to a thread pool.

This thread-pool invokes the Servlet’s methods and it is in the context of these threads that the Servlet reads an HTTP request from the socket. This read is typically blocking but has a maximum read timeout. After the read and request processing is done, the read-interest is once again registered with the event de-multiplexer for the channel. The thread gets recycled for handling more such requests.
So in effect, it becomes the thread-pool’s responsibility to read from the socket.

Whereas, a proactor implementation of a container for HTTP may be as follows-

In this case, the event de-multiplexer part has more work to do. It maintains a buffer for each socket channel. Whenever a read-event occurs, it does a non-blocking read of the data from the OS and fills up whatever data it can from the socket channel into the buffer. As long as the complete request is not assembled, it continues the process. Once it identifies that a complete HTTP request has been assembled in memory, it sends the collected data from the buffer to be processed to the thread-pool. Meanwhile, it continues to read further data from the socket channel and filling up the buffer again. For reading up buffers, it can also delegate to an internal pool of worker threads.

As in the reactor approach, the thread-pool invokes the Servlet’s threads but the read is actually being done from an in-memory buffer rather than the client socket channel.

Let us compare the two approaches.

In the reactor variation, since the application thread pool is reading from the actual socket, one saves on the memory consumed. There is no buffering up of data except for the actual request data that will be used by the application. When one is concerned about the possible memory consumption due to unknown request sizes, this would be an optimal approach.

On the down-side, consider that the clients are sending data to the server over a fragile medium with low bandwidth like GPRS. I happen to have some experience developing server-side communicators for GPRS. GPRS actually uses unused time division multiple access (TDMA) slots of a GSM system to transfer data, for example, when there are gaps and pauses in speech. This means that the transfer of data can happen very slowly depending on the voice traffic and moreover, it would arrive in bursts rather than a complete packet. The time gap between bursts of data is arbitrary, depending on the availability of time slots. So, if the reading is done by the Servlet in the context of the thread-pool, a precious container thread gets blocked till more data can arrive on the socket. This can bring down the scalability of the application. So, for low bandwidth transportation mediums, such a server could prove sub-optimal.

In comparison, the proactor variation would have much better throughput and scalability for low-bandwidth transport mediums since the event de-multiplexer part would take care to buffer up data in memory and the thread-pool would read and process completed requests from in-memory buffers. On the down-side, this may be unnecessary when you are assured of excellent bandwidth on your LAN or WAN and would be sub-optimal if you have memory constraints. Of course, this would require very careful implementation of the container or server with stringent memory handling and timeouts and users of such containers may have to read the documentation very carefully to configure optimally. If not, it could even lead to memory leaks.

Most Application Servers/Servlet Containers actually use the reactor pattern, because most web-site based applications can assume good bandwidth from clients and keeping in view the hazards of inefficient memory management. Personally, I was hit by this behavior when I used Weblogic8.1 Application Server and its Servlet Container for handling HTTP POST data from GPRS based clients.

In general, for designing server-side applications which only use HTTP as a transport medium, I would recommend using a proactor pattern-based approach. To handle memory constraints, a proper timeout mechanism should be implemented to detect idling of clients and cleanup buffers.


  1. How can you check the reactor or proactor implementation on a given java webserver

  2. If the java servlet container's documentation does not reveal the underlying implementation, it is possible to find something about the implementation using thread-dump when the servlet is reading the body of the request using This is what I did with my servlet deployed on Weblogic8.1 which was reading POST data from GPRS clients. The thread dump showed that the read was being done from the native socket rather than from a buffer. No wonder that my servlet thread was getting blocked and eventually timed out when the complete request bytes were not coming in via GPRS.

  3. How does the presence of a WebServer (like Apache) in front of the Application Server help. I would assume, in this scenario the Apache would have actually done the reading from the client before it passes it on to the Application Server and this problem of threads been held up should not be a problem.

  4. Hmmm. That is something we can't really assume. With respect to reactor/proactor - not very sure how apache or any other web server would read the request. Like I said, I would personally not have liked using Apache for GPRS clients - since I would not be sure. The problem of threads being held up would probably then have moved to the web server level, for all I knew. A bottleneck at any level would have been sub-optimal. Remember that web-servers have been primarily made for internet traffic which usually has good bandwidth.