This is a discussion on socket stops, bytes stuck in send-Q within the Linux Networking forums, part of the Linux Forums category; I've run into a weird problem recently, where a socket works for a little while and then stops, with ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
I've run into a weird problem recently, where a socket works for a little
while and then stops, with bytes seemingly stuck in the send-Q. The basic setup is this: I have a client (Java) and a server (C++) that are running on the same machine. The client sends some commands to the server, and gets some commands and data back in response. When I hit the problem, at some point while the client is reading the response from the server, the socket reports no data available for reading, and stays in that state. According to logs on the server side, the server has successfully written everything to the socket. Once the socket gets goofed up, netstat reports that the send-Q on the server end has a bunch of bytes, and the recv-Q on the client end is empty. The problem seems to be at least partially something related to state in the OS itself, because after a reboot things seem to work okay for a while (e.g., a week or two). Once the error starts happening, I can start new clients and servers as often as I want to and the problem pretty reliably occurs, and after roughly the same number of bytes are sent on the socket each time. More details are below, for those who are interested. I'd really appreciate it if anyone has any insights into this problem... Kent Wenger -------------------------------------------------------------- This is on Tao Linux 1.0. We switched from RedHat 9.0 to Tao a few months ago, and I haven't seen the problem before the OS change, but I'm not totally sure how related the problem is to the OS change. Here's info about the kernel: uname -a Linux pumori.cs.wisc.edu 2.4.21-15.0.4.TLsmp #1 SMP Tue Aug 3 23:02:48 EDT 2004 i686 unknown Okay, here's an example. The server is listening on port 54409. Here's what netstat has to say once the socket has gotten goofed up: Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 16383 localhost:54409 localhost:54412 ESTABLISHED tcp 0 0 localhost:54412 localhost:54409 ESTABLISHED Here's tcpdump monitoring port 54409: tcpdump: WARNING: Promiscuous mode not supported on the "any" device tcpdump: listening on any 16:45:28.190869 localhost.54412 > localhost.54409: P 2514246381:2514246447(66) ack 2514402135 win 32767 <nop,nop,timestamp 123647281 123638448> (DF) [ttl 0] 16:45:28.190932 localhost.54409 > localhost.54412: . ack 66 win 32767 <nop,nop,timestamp 123647281 123647281> (DF) [ttl 0] 16:45:28.221828 localhost.54409 > localhost.54412: P 1:28(27) ack 66 win 32767 <nop,nop,timestamp 123647284 123647281> (DF) [ttl 0] 16:45:28.221894 localhost.54412 > localhost.54409: . ack 28 win 32767 <nop,nop,timestamp 123647284 123647284> (DF) [ttl 0] 16:45:28.706255 localhost.54412 > localhost.54409: P 66:132(66) ack 28 win 32767 <nop,nop,timestamp 123647332 123647284> (DF) [ttl 0] 16:45:28.712470 localhost.54409 > localhost.54412: P 28:55(27) ack 132 win 32767 <nop,nop,timestamp 123647333 123647332> (DF) [ttl 0] 16:45:28.712538 localhost.54412 > localhost.54409: . ack 55 win 32767 <nop,nop,timestamp 123647333 123647333> (DF) [ttl 0] 16:45:28.750493 localhost.54412 > localhost.54409: P 132:181(49) ack 55 win 32767 <nop,nop,timestamp 123647337 123647333> (DF) [ttl 0] 16:45:28.788871 localhost.54409 > localhost.54412: . ack 181 win 32767 <nop,nop,timestamp 123647341 123647337> (DF) [ttl 0] 16:45:29.832424 localhost.54409 > localhost.54412: P 55:309(254) ack 181 win 32767 <nop,nop,timestamp 123647445 123647337> (DF) [ttl 0] 16:45:29.832513 localhost.54412 > localhost.54409: . ack 309 win 32767 <nop,nop,timestamp 123647445 123647445> (DF) [ttl 0] 16:45:29.835069 localhost.54409 > localhost.54412: P 309:558(249) ack 181 win 32767 <nop,nop,timestamp 123647445 123647445> (DF) [ttl 0] 16:45:29.835112 localhost.54412 > localhost.54409: . ack 558 win 32767 <nop,nop,timestamp 123647445 123647445> (DF) [ttl 0] 16:45:29.837560 localhost.54409 > localhost.54412: P 558:809(251) ack 181 win 32767 <nop,nop,timestamp 123647445 123647445> (DF) [ttl 0] 16:45:29.837597 localhost.54412 > localhost.54409: . ack 809 win 32767 <nop,nop,timestamp 123647445 123647445> (DF) [ttl 0] 16:45:29.840035 localhost.54409 > localhost.54412: P 809:1060(251) ack 181 win 32767 <nop,nop,timestamp 123647446 123647445> (DF) [ttl 0] 16:45:29.840076 localhost.54412 > localhost.54409: . ack 1060 win 32767 <nop,nop,timestamp 123647446 123647446> (DF) [ttl 0] 16:45:29.842522 localhost.54409 > localhost.54412: P 1060:1317(257) ack 181 win 32767 <nop,nop,timestamp 123647446 123647446> (DF) [ttl 0] 16:45:29.842559 localhost.54412 > localhost.54409: . ack 1317 win 32767 <nop,nop,timestamp 123647446 123647446> (DF) [ttl 0] 16:45:29.844998 localhost.54409 > localhost.54412: P 1317:1580(263) ack 181 win 32767 <nop,nop,timestamp 123647446 123647446> (DF) [ttl 0] 16:45:29.845038 localhost.54412 > localhost.54409: . ack 1580 win 32767 <nop,nop,timestamp 123647446 123647446> (DF) [ttl 0] 16:45:29.847478 localhost.54409 > localhost.54412: P 1580:1829(249) ack 181 win 32767 <nop,nop,timestamp 123647446 123647446> (DF) [ttl 0] 16:45:29.847516 localhost.54412 > localhost.54409: . ack 1829 win 32767 <nop,nop,timestamp 123647446 123647446> (DF) [ttl 0] 16:45:29.850161 localhost.54409 > localhost.54412: P 1829:1957(128) ack 181 win 32767 <nop,nop,timestamp 123647447 123647446> (DF) [ttl 0] 16:45:29.850204 localhost.54412 > localhost.54409: . ack 1957 win 32767 <nop,nop,timestamp 123647447 123647447> (DF) [ttl 0] 16:45:29.851291 localhost.54409 > localhost.54412: P 1957:2082(125) ack 181 win 32767 <nop,nop,timestamp 123647447 123647447> (DF) [ttl 0] 16:45:29.851327 localhost.54412 > localhost.54409: . ack 2082 win 32767 <nop,nop,timestamp 123647447 123647447> (DF) [ttl 0] 16:45:29.883892 localhost.54409 > localhost.54412: P 2082:2155(73) ack 181 win 32767 <nop,nop,timestamp 123647450 123647447> (DF) [ttl 0] 16:45:29.883972 localhost.54412 > localhost.54409: . ack 2155 win 32767 <nop,nop,timestamp 123647450 123647450> (DF) [ttl 0] 16:45:29.886206 localhost.54409 > localhost.54412: P 2155:2313(158) ack 181 win 32767 <nop,nop,timestamp 123647450 123647450> (DF) [ttl 0] 16:45:29.886245 localhost.54412 > localhost.54409: . ack 2313 win 32609 <nop,nop,timestamp 123647450 123647450> (DF) [ttl 0] 16:45:29.887798 localhost.54409 > localhost.54412: P 2313:2453(140) ack 181 win 32767 <nop,nop,timestamp 123647450 123647450> (DF) [ttl 0] 16:45:29.887838 localhost.54412 > localhost.54409: . ack 2453 win 32469 <nop,nop,timestamp 123647450 123647450> (DF) [ttl 0] 16:45:29.889243 localhost.54409 > localhost.54412: P 2453:2576(123) ack 181 win 32767 <nop,nop,timestamp 123647451 123647450> (DF) [ttl 0] 16:45:29.889280 localhost.54412 > localhost.54409: . ack 2576 win 32346 <nop,nop,timestamp 123647451 123647451> (DF) [ttl 0] 16:45:29.900637 localhost.54409 > localhost.54412: P 2576:2647(71) ack 181 win 32767 <nop,nop,timestamp 123647452 123647451> (DF) [ttl 0] 16:45:29.938981 localhost.54412 > localhost.54409: . ack 2647 win 32275 <nop,nop,timestamp 123647456 123647452> (DF) [ttl 0] 16:45:29.939058 localhost.54409 > localhost.54412: P 2647:3597(950) ack 181 win 32767 <nop,nop,timestamp 123647456 123647456> (DF) [ttl 0] 16:45:29.978934 localhost.54412 > localhost.54409: . ack 3597 win 31325 <nop,nop,timestamp 123647460 123647456> (DF) [ttl 0] 16:45:29.978999 localhost.54409 > localhost.54412: P 3597:3953(356) ack 181 win 32767 <nop,nop,timestamp 123647460 123647460> (DF) [ttl 0] 16:45:30.018939 localhost.54412 > localhost.54409: . ack 3953 win 30969 <nop,nop,timestamp 123647464 123647460> (DF) [ttl 0] 16:45:30.019015 localhost.54409 > localhost.54412: P 3953:4287(334) ack 181 win 32767 <nop,nop,timestamp 123647464 123647464> (DF) [ttl 0] 16:45:30.058938 localhost.54412 > localhost.54409: . ack 4287 win 30635 <nop,nop,timestamp 123647468 123647464> (DF) [ttl 0] 16:45:30.064350 localhost.54409 > localhost.54412: P 4287:4353(66) ack 181 win 32767 <nop,nop,timestamp 123647468 123647468> (DF) [ttl 0] 16:45:30.098935 localhost.54412 > localhost.54409: . ack 4353 win 30569 <nop,nop,timestamp 123647472 123647468> (DF) [ttl 0] 90 packets received by filter 0 packets dropped by kernel |