-
Notifications
You must be signed in to change notification settings - Fork 0
Socket Distributor design
The Socket Distributor provides a framework for receiving sockets and starting tasks to communicate over socket connections. The Socket Distributor is a multi-threaded network server. It was designed to be reliable, simple and cause as little problems as possible when connections fail.
The Socket Distributor is suitable for up to about 1 000 concurrent connections that last for a limited amount of time. Each connection is handled by a separate thread, but threads are re-used where possible. E.g. if there are 10 connections per second and each connections lasts for about 1 second, only 10 threads will be created and then re-used.
The Socket Distributor is limited by the amount of memory available (each thread+socket takes about 100 kilobytes) and the amount of threads the server can efficiently support (Linux appears to be better at this then Windows for example).
If you need to support more open connections at the same time, the "thread per connection" will simply not hold, nor the simplicity of the source code. Look into Apache Mina for example for handling more connections.
The source code of the Socket Distributor is kept small and simple:
- it consists of 3 classes (socket acceptor, exchanger and worker) with a total of about 1000 lines of code (including comments).
- Java concurrent is used to take away most of the complexities of dealing with multiple threads (and it also makes it fast).
- it only does what it is designed to do, there are no helper/utility methods (e.g. no "load and set properties method" available) or extension points/plugins.
- limited flexibility: if you want the Socket Distributor to behave differently, extend or change the source code.
Sockets that are accepted but are not handled by a thread cause problems when the server hangs (e.g. when I/O takes forever, resource is not available, etc.). When a socket is accepted, it must be handled by a thread within milliseconds.
Sockets that are waiting to be accepted can also time out. Given the former rule "No queues with sockets", this must be avoided. At the same time the server needs to be reliable. To accomplish this:
- There is a separate thread handling incoming sockets (the server socket thread)
- The server socket thread does not perform any (I/O) operations that can block for an undefined amount of time, it only accepts the incoming socket.
- The server socket thread can always find or start a thread within milliseconds to handle the incoming socket.
The last design principle can potentially blow up the server (each thread takes about 100kBytes). See the "Prevent overload" section on how this can be prevented.
Always accepting sockets and starting threads can easily blow up the server. There is no way to completely prevent this from happening. But there are ways to prevent an overload:
- The server socket thread is configured to wait 5 milliseconds (default, can be changed) for a thread to become available. If a thread is not available within 5 milliseconds, a new thread is started. This "wait for thread" time out can be increased to give other threads more time to become available and thus prevents starting new threads.
- Each socket worker (the class handling the communication over the socket) is given a boolean variable: tooBusy. The socket worker should use this variable: if tooBusy is true communication must be limited as much as possible and the socket closed as soon as possible.
- Not directly supported by the Socket Distributor but can be implemented: close sockets immediatly when there are way too many threads running. The Socket Distributor keeps track of how many threads are running. If you compare this with the maximum amount of threads that should be running, you can decide to skip sending the "too busy" message and just close the socket immediatly.
The Socket Distributor includes a shutdown procedure which does not only close the server socket thread but also stops the workers (threads using a socket connections).
The shutdown procedure relies on proper handling of 'stop' signals and thread interrupts. The latter must also be properly handled by workers (they should not make interrupted-exceptions disappear). But the rest is part of the Socket Distributor.
Shutdown is done in 3 parts (with configurable time-outs):
- Close the socket acceptor. Signal 'stop' to all workers. Wait a while.
- Interrupt any workers that did not stop. Wait a while.
- Exit (application closes).
To prevent the Exit from hanging, all workers are started as daemon threads. The JVM will simply kill these if they are still running. If workers are hanging and not a daemon, the JVM will never exit.