This is a discussion on recursive-clients queue size & clean-up within the Bind Users forums, part of the DNS and Related Forums category; Do any guidelines about how to size your recursive-clients queue exist ? I have public recursive server with around 2000req/...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
Do any guidelines about how to size your recursive-clients queue exist ?
I have public recursive server with around 2000req/sec. Does each slot in the recursive-client queue being clean up after the timeout expire, if there is no response? Or some slots are being occupied longer, it seems to me that when I reach this limit there is no really way back to stabilize bind, all cpu will be used and even if I leave it over night when the traffic sometimes goes as little as 300-400 req/sec it will not recover and still the messages keeps coming from time to time, cpu is very high (abnormal to the number of incoming requests) and number of requests logged to the query.log file is almost just half of what the box is really suppose to receive, (looks like bind or os dropping the traffic). There is no weird traffic, maybe there was a weird spike, but it should recover. When I stop and start service resumes, cpu drops, traffic comes back to normal rate, not almost like half rate as it was during the problem, and recursive-client queue is not overflowed. I have recently moved to Solaris9 with the latest patches, I have tried several ways how to compile the bind, and I had solaris 8 before, I had even tried several bind versions, single thread, multithread, 32bit code, 64 bit code, but I still face this problem from time to time, I managed to trace it back little, it looks to me like there is always before this problem happen some spike in the traffic, like temporarily flood (let's say for few seconds ,minutes - like 500/600 req/sec of unreachable domain), recursive-client queue gets full and doesn't really recover afterwards... Server is e280 2xCPU Sparc3,bind 9.2.1 and 9.2.3 Does rndc flush, flush the recursive client queue as well ? If I assume 90 seconds timeout for each slot in the queue, it basically means (11 unreachable req)/sec will fill 1000 slot queue in 90 seconds, once it is full how it will recover? Unless I have traffic with less then (11 unreachable req)/second it can not recover. How many such a requests are in public traffic received with 2000req/sec rate? definitely 11 such a requests will be there, not just eleven but IMHO 100 (one hundred) maybe 200 or 300... What should be the queue size? 300/11*1000=27272(twentyseventhousand)??? I posted similar issue some time back, but couldn't make some conclusion from answers. Does it really seems to be so minor thing or there is really no clue how to set the queue size, since it is not clear how it is being used? Do we need commercial support to get somebody answers, yes this is the way how the queue is managed, these are the guidelines how to set it, this is the way how to recover, if it became full? The queue size doesn't purely depend on number of users or requests, but also on the weirdness of the traffic, which is especially in public environment increasingly becoming very very weird. If there are guidelines, and general understanding of the queue management, each of us can tune it as per his own traffic characteristic. Ladislav |