Monday, February 3, 2014

Syns, Syn Cookies, TCP Listen Backlog: More Complicated than You Think

No, syncookies don't have anything to do with dieting.  But they did come up as I learned that the TCP listen backlog is more complicated than I thought.  This article should help those of you trying to support TCP servers with lots of clients, especially if large numbers of clients can try to connect at the same time.  (For example, a popular web server.)  This is Linux-oriented; I'm not sure how applicable the info is for other OSes.


Here is an article which talks about the TCP listen backlog. Here are some quotes:
    The backlog has an effect on the maximum rate at which a server can accept new TCP connections on a socket. ... Many systems (particularly BSD-derived or influenced) silently truncate this value (the backlog parameter to the listen() system call) to 5 — version 1.2.13 of the Linux kernel [really old - SF] does this ... Using small values for the listen backlog was one of the major causes of poor web server performance with many operating systems up until recently. ... The backlog parameter is silently truncated to SOMAXCONN ... defined as 128 in /usr/src/linux/socket.h for 2.x kernels.

Here is a brilliant writeup that taught me about "syncookies", and how they can lead to hung clients. Basically, if the listen backlog (a.k.a. the SYN queue) fills up and more client connection requests (SYNs) come in, the server will *act* like it is accepting them by responding with syncookies. But the kernel won't actually set up state for those connections or inform the app of the new connection. Instead, the server waits for the client to respond with the ACK (the third step of the 3-way handshake). That ACK contains enough information for the server to reconstruct the initial SYN, and the kernel proceeds to open the connection as normal. HOWEVER, if the client's ACK gets lost in the switch or the NIC or whatever, then the client will be left thinking the connection was accepted and is ready, and the server will have no memory of it.

This leads to a genuine hang if the application protocol depends on the server sending the first message, like SMTP or MySQL. In these cases, the client app will hang forever waiting for the server to send its message.


Here is an article which gives advice on how to set up systems that can accept lots of TCP connections. Here's a quote:
    Three system configuration parameters must be set to support a large number of open files and TCP connections with large bursts of messages. Changes can be made using the /etc/rc.d/rc.local or /etc/sysctl.conf script to preserve changes after reboot. In either case, you can write values directly into these files (e.g. "echo 32832 > /proc/sys/fs/file-max").
    • /proc/sys/fs/file-max: The maximum number of concurrently open files. We recommend a limit of at least 32,832.
    • /proc/sys/net/ipv4/tcp_max_syn_backlog: Maximum number of remembered connection requests, which are still did not receive an acknowledgment from connecting client. The default value is 1024 for systems with more than 128Mb of memory, and 128 for low memory machines. If server suffers of overload, try to increase this number.
    • /proc/sys/net/core/somaxconn: Limit of socket listen() backlog, known in userspace as SOMAXCONN. Defaults to 128. The value should be raised substantially to support bursts of request. For example, to support a burst of 1024 requests, set somaxconn to 1024.
Here are some commands I entered on our host Saturn:

   sford@Saturn$ cat /proc/sys/net/core/somaxconn
   sford@Saturn$ cat /proc/sys/net/ipv4/tcp_max_syn_backlog
   sford@Saturn$ cat /proc/sys/fs/file-max

Looks like the main thing we need to do is increase somaxconn, and maybe tcp_max_syn_backlog as well.


One small concern. I saw various references to SOMAXCONN as being a constant in a system include file. It apparently lives in different places, depending on the OS flavor/version; I found it here:

   sford@Saturn$ find /usr/include | xargs egrep SOMAXCONN
   /usr/include/bits/socket.h:#define SOMAXCONN 128

So now the question becomes, if we update the tuning parameter, do we also have to modify the include file? My gut says no. I'm thinking that maybe if you built the kernel from source, you would perhaps change the default via that include, but on a running system you simply override that default and you can magically use larger numbers in the listen() call.

No comments: