View Single Post

  #8 (permalink)  
Old 01-13-2004
Rick Jones
 
Posts: n/a
Default Re: SLOW sockets under Redhat 9.0

David J. Bakeman <dbakeman@comcast.net> wrote:
> We do have the TCP_NODELAY set. By the way what do you mean by Nagle?


The Nagle algorithm is what one disables by setting TCP_NODELAY. In
broad terms, the Nagle algorithm is supposed to work like this:

1) is this send, plus any queued, unsent data, >= the MSS (Maximum
Segment Size) of the connection. If yes, send data now (modulo other
limits). If no, go to question 2:

2) is the connection otherwise idle? That is, is there no unACKed
data outstanding on the network. If yes, send data now. If no, queue
data and wait for either more data to be sent, or the outstanding data
to be ACKed.

That then ties-in to remote ACK policies. In even _broader_ terms, an
ACK is generated by TCP when:

a) there is data to be sent the other way, ACK piggybacked
b) it is time to send a window update, ACK piggybacked
c) the standalone ACK timer expires

If an application (IMO a broken one) sends "logically associated" data
in separate calls to "send" and those bits and peices are < MSS, then
it will experience what are often called "Nagle induced delays."

The example I like to use is an email application that sends the email
header to the transport separate from the email data. It then waits
for an application-level ack before sending the next message.

The headers will go-out, but the second, sub-MSS send will queue
waiting for more data or an ACK. As there is no more data, it awaits
the ACK from the receiving TCP. There is no data to go back the sender
- the entire email hasn't arrived, so no ACK from "a." There is no
need for a window update, so we don't get "b." That means we wait for
"c."

This then is the "Nagle induced slowness."

Many folks take the "quick out" of simply setting TCP_NODELAY. IMO
the "proper" fix is to present logically associated data to the
transport at the same time.

There have also been stacks that have botched their implementation of
Nagle by making that interpretation on a segment by segment basis
rather than a user send by user send basis. This has resulted in
problems even when the application is making sends > MSS. One way to
see ifyou areon such a stack is to run a netperf TCP_RR test
(http://www.netperf.org) - set the request/response size to MSS (often
1460, sometimes 1448). Note the transaction rate. The increase the
request/response size by one byte. If there is a huge drop in
performance, the stack at one end or the other has a broken
implemenation of the Nagle algorithm.

There are also quasi-real-time unidirectional apps that do indeed
"need" to set TCP_NODELAY - say something sending frequent XY position
data or the like.

Lots more can be found via Google. Including discussion of Linux's
TCP_CORK option.

rick jones
--
oxymoron n, commuter in a gas-guzzling luxury SUV with an American flag
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to post, OR email to raj in cup.hp.com but NOT BOTH...
Reply With Quote