Re: Strange BIND9 issue

This is a discussion on Re: Strange BIND9 issue within the Bind Users forums, part of the DNS and Related Forums category; Will Yardley wrote: > On 2005-01-12, Brad Knowles <brad@stop.mail-abuse.org> wrote: > >&...


Go Back   Usenet Forums > DNS and Related Forums > Bind Users

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 01-14-2005
Guido Roeskens
 
Posts: n/a
Default Re: Strange BIND9 issue

Will Yardley wrote:
> On 2005-01-12, Brad Knowles <brad@stop.mail-abuse.org> wrote:
>
>>At 7:42 PM -0600 2005-01-11, Will Yardley wrote:

>
>
>>> radon: 04:56pm# while true ; do dig yahoo.com @66.33.216.127 | grep
>>>Query ; done
>>> ;; Query time: 790 msec
>>> ;; Query time: 868 msec
>>> ;; Query time: 753 msec
>>> ;; Query time: 798 msec
>>> ;; Query time: 982 msec
>>> ;; Query time: 1178 msec
>>> ;; Query time: 1284 msec
>>> ;; Query time: 1291 msec
>>> ;; Query time: 1208 msec
>>> ;; Query time: 738 msec

>
>
>>You're completely by-passing the local caching BIND nameserver here.
>>You're going directly the the nameserver specified in the command
>>line, and the local copy of BIND is not involved at all. Unless that
>>is the public IP address of your machine, but then queries to
>>127.0.0.1 or the public IP address should be going to the same copy of
>>BIND running on the same machine, and I don't understand why this
>>would result in the kind of difference you're seeing.

>
>
> Right - same copy of BIND, same machine, which is why I find it so odd.
>
> Sorry I wasn't more clear about that (though the IP was listed as a
> listen-on address in the config snippet).
>
>
>>Have you seen this kind of behaviour regardless of which IP address
>>you query?

>
>
> When the problem comes up, querying via the machine's loopback interface
> doesn't seem to have the problem; querying the public IP (whether from
> the machine itself, which should be essentially the same thing, or from
> outside) experiences /huge/ timeouts. The actual problem we see is that
> other services (ssh, smtp, etc.) then become slow because DNS lookups on
> machines with this nameserver listed first in resolv.conf are slower
> (going to the second nameserver listed).
>
> Barry's response is interesting, but we're not doing any funny
> networking stuff here (definitely no anycast) - and I'm testing the
> query from the local machine, so I think it should be exactly the same
> whether the loopback interface or a public interface is being queried.

It's not only a question of anycast.
If you have one IP which gets "hammered" with queries, the queue for
the IP in the OS can get filled up and packets get dropped.
You could test it, if you configure another IP on the interface.
If you have slow or no responses on the "known" IP but normal or
fast responses on the second IP, the queue of the OS seems to
overflow.
>
>
>>> recursive-clients 6000;
>>> tcp-clients 1500;
>>> max-cache-size 150000000;

>
>
>>Why have you defined these? Why not make the configuration simpler
>>and disable them. If this fixes your problem, then you know where to
>>look. If not, then you know to look elsewhere.

>
>
> The max-cache-size thing is an attempt to somewhat limit the amount
> of memory BIND sucks up... the default is unlimited. Setting it
> explicitly should just mean that old records get purged early, no? I
> can try disabling this if people really think it will help - before we
> switched to rbldnsd for the blocklists (pretty huge zones), we had to
> run two BIND instances and they were sucking down /huge/ amounts of
> memory between them.

I wouldn't disable max-cache-size.
When you disable it, Bind could eat up all memory and the
server starts to swap.
>
> The recursive-clients and tcp-clients were added when this problem or a
> similar problem came up just to make sure that we weren't hitting one of
> those limits. Re-reading the ARM, it looks like I somewhat misunderstood
> the meaning of this option, though. I'll try disabling both of those and
> see if that helps...

The defaults are quite low (tcp 100, udp ?)
We hit the limits all the time and increasing them helps.
If you set them too high, the server will have problems also.
Our numbers are similar (or higher) to yours.
>
>
>>> /* only allow queries from internal networks */
>>> allow-query { dh_known_networks; 127.0.0.0/8; };

>
>
>>Well, that would pretty much kill you from doing queries to the
>>external IP address.

>
>
> Well a), I was doing the query from the local machine, and b),
> dh_known_networks is listed in the ACL section that I snipped from the
> config... it's an ACL which lists all the networks allowed to query the
> machine.
>
> Thanks...
>
> w
>
>
>

Guido



Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT +1. The time now is 03:55 AM.


Powered by vBulletin® Version 3.6.8
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.0.0