This is a discussion on SLOW sockets under Redhat 9.0 within the Linux Networking forums, part of the Linux Forums category; I have an application that uses a simple tcp/ip socket to communicate between 2 processes. This has worked fine ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
I have an application that uses a simple tcp/ip socket to communicate
between 2 processes. This has worked fine until we upgraded to RH9.0. Now the socket connection is incredibly slow! What used to take around 5ms now takes 20-90ms! Anybody have any ideas? I've checked all of my calls to make sure I'm not using anything that's deprecated (I'm not) and I've tried changing the calls even though this setup's been working for over 3 years since we first ported our app from the SGI to RH6.1. Thanks for any help!! David J. Bakeman dbakeman@comcast.net |
|
|||
|
perchance you have an nvidia chipset?
shooting in the dark since you did not give any system or network config information. CL David J. Bakeman wrote: > I have an application that uses a simple tcp/ip socket to communicate > between 2 processes. This has worked fine until we upgraded to RH9.0. > Now the socket connection is incredibly slow! What used to take around > 5ms now takes 20-90ms! > > Anybody have any ideas? > > I've checked all of my calls to make sure I'm not using anything that's > deprecated (I'm not) and I've tried changing the calls even though this > setup's been working for over 3 years since we first ported our app from > the SGI to RH6.1. > > Thanks for any help!! > > David J. Bakeman > dbakeman@comcast.net > |
|
|||
|
On 2004-01-01, David J. Bakeman <dbakeman@comcast.net> wrote:
> I have an application that uses a simple tcp/ip socket to communicate > between 2 processes. This has worked fine until we upgraded to RH9.0. > Now the socket connection is incredibly slow! What used to take around > 5ms now takes 20-90ms! Might this be a Nagle algorithm issue? If you are sending tiny bits of data, you might want to set the TCP_NODELAY socket option. Chris |
|
|||
|
Carl wrote:
> perchance you have an nvidia chipset? > We have the nvidia graphics chipset. > shooting in the dark since you did not give any system or network config > information. I should have clarified things. The 2 processes are usually on the same machine. So I assumed that the network hardware did not matter. Also the codes been working fine from rh6.1-7.3 (and several versions of the SGI os) > > > CL > > David J. Bakeman wrote: > >> I have an application that uses a simple tcp/ip socket to communicate >> between 2 processes. This has worked fine until we upgraded to RH9.0. >> Now the socket connection is incredibly slow! What used to take around >> 5ms now takes 20-90ms! >> >> Anybody have any ideas? >> >> I've checked all of my calls to make sure I'm not using anything that's >> deprecated (I'm not) and I've tried changing the calls even though this >> setup's been working for over 3 years since we first ported our app from >> the SGI to RH6.1. >> >> Thanks for any help!! >> >> David J. Bakeman >> dbakeman@comcast.net >> > |
|
|||
|
Christopher Wong wrote:
> On 2004-01-01, David J. Bakeman <dbakeman@comcast.net> wrote: > >>I have an application that uses a simple tcp/ip socket to communicate >>between 2 processes. This has worked fine until we upgraded to RH9.0. >>Now the socket connection is incredibly slow! What used to take around >>5ms now takes 20-90ms! > > > Might this be a Nagle algorithm issue? If you are sending tiny bits of > data, you might want to set the TCP_NODELAY socket option. We do have the TCP_NODELAY set. By the way what do you mean by Nagle? > > Chris |
|
|||
|
David J. Bakeman <dbakeman@comcast.net> wrote:
> I have an application that uses a simple tcp/ip socket to communicate > between 2 processes. This has worked fine until we upgraded to RH9.0. > Now the socket connection is incredibly slow! What used to take around > 5ms now takes 20-90ms! > > Anybody have any ideas? DISCLAIMER: I am not a RH user/administrator or developer One of the largest differences (and the one that breaks more software than anything else), is that RH9 uses a different thread library called NPL, which is better than pthreads (according to the literature anyway). Are you (or your libraries) using threads at all? If so, it may be that you're using threads in a not quite standard way (that still works well under pthreads) Other possibilities: What kernel are you using now, and when it worked well last? Is it a general slowness, or does it appear to be spiky? Does it affect other applications? What about non-RH provided source packages of some bandwidth testing tool? (non-RH ones may show the same problem). -- Cameron Kerr cameron.kerr@paradise.net.nz : http://nzgeeks.org/cameron/ Empowered by Perl! |
|
|||
|
David J. Bakeman wrote: > Carl wrote: > >> perchance you have an nvidia chipset? >> > > We have the nvidia graphics chipset. > yes, but do you have the NVIDIA nforce chipset which includes the network card? if so have you installed the crappy nforce261 drivers? if so you need to do some fixing to those crappy drivers, they would be your problem. >> shooting in the dark since you did not give any system or network >> config information. > > > I should have clarified things. The 2 processes are usually on the same > machine. So I assumed that the network hardware did not matter. Also > the codes been working fine from rh6.1-7.3 (and several versions of the > SGI os) > depends on how the processes communicate, if they use 'localhost' or the actual ip address to talk with each other. clg |
|
|||
|
David J. Bakeman <dbakeman@comcast.net> wrote:
> We do have the TCP_NODELAY set. By the way what do you mean by Nagle? The Nagle algorithm is what one disables by setting TCP_NODELAY. In broad terms, the Nagle algorithm is supposed to work like this: 1) is this send, plus any queued, unsent data, >= the MSS (Maximum Segment Size) of the connection. If yes, send data now (modulo other limits). If no, go to question 2: 2) is the connection otherwise idle? That is, is there no unACKed data outstanding on the network. If yes, send data now. If no, queue data and wait for either more data to be sent, or the outstanding data to be ACKed. That then ties-in to remote ACK policies. In even _broader_ terms, an ACK is generated by TCP when: a) there is data to be sent the other way, ACK piggybacked b) it is time to send a window update, ACK piggybacked c) the standalone ACK timer expires If an application (IMO a broken one) sends "logically associated" data in separate calls to "send" and those bits and peices are < MSS, then it will experience what are often called "Nagle induced delays." The example I like to use is an email application that sends the email header to the transport separate from the email data. It then waits for an application-level ack before sending the next message. The headers will go-out, but the second, sub-MSS send will queue waiting for more data or an ACK. As there is no more data, it awaits the ACK from the receiving TCP. There is no data to go back the sender - the entire email hasn't arrived, so no ACK from "a." There is no need for a window update, so we don't get "b." That means we wait for "c." This then is the "Nagle induced slowness." Many folks take the "quick out" of simply setting TCP_NODELAY. IMO the "proper" fix is to present logically associated data to the transport at the same time. There have also been stacks that have botched their implementation of Nagle by making that interpretation on a segment by segment basis rather than a user send by user send basis. This has resulted in problems even when the application is making sends > MSS. One way to see ifyou areon such a stack is to run a netperf TCP_RR test (http://www.netperf.org) - set the request/response size to MSS (often 1460, sometimes 1448). Note the transaction rate. The increase the request/response size by one byte. If there is a huge drop in performance, the stack at one end or the other has a broken implemenation of the Nagle algorithm. There are also quasi-real-time unidirectional apps that do indeed "need" to set TCP_NODELAY - say something sending frequent XY position data or the like. Lots more can be found via Google. Including discussion of Linux's TCP_CORK option. rick jones -- oxymoron n, commuter in a gas-guzzling luxury SUV with an American flag these opinions are mine, all mine; HP might not want them anyway... :) feel free to post, OR email to raj in cup.hp.com but NOT BOTH... |