This is a discussion on Re: pfil2.1.11 performance issues - pfil_printmchain sprintf within the IPFilter forums, part of the System Security and Security Related category; > > Ian Donaldson wrote: > > I have a pair of Sun Fire X2100M2's connected via 100M eth ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
>
> Ian Donaldson wrote: > > I have a pair of Sun Fire X2100M2's connected via 100M eth switches > > (yeah, crippling gig-E) and running pfil 2.1.11, ip_fil4.1.16 and > > was noticing significant TCP throughput performance differences > > for traffic between various ethernet interfaces on the two systems. > > > > (both systems running Solaris 10/x86 6/06 with 26 Feb recommmended > > patch cluster, NVIDIA add-on driver patch 122530-02 for nge) > > > > eg: system1 bge0 -> system2 bge0 1700KB/s > > system1 bge1 -> system2 bge1 11000KB/s > > system1 nge1 -> system2 nge1 11000KB/s > > > > With top I noticed a significant portion of system time being consumed > > in the bge0 test (like 50%). > > > > Using > > > > lockstat -kIi997 sleep 10 > > > > What is curious though is that this problem only manifests itself > > on one of the 3 interfaces I have enabled in the system, suggesting > > something else is broken, as I would have though that all interface > > traffic would pass thru the same code. > > (yes I've verified pfil module is pushed on all interfaces) > > > > It doesn't manifest itself on another X2100M2 system that only has > > bge0 enabled but. > > > > Are you saying that where bge1 is used but not bge0, the problem doesn't > arise? > That would be strange! if it happened when either bge0 or bge1 was > being used, > I could understand that...kinda...it'll be because the bge driver is > communicating > with IP "differently" because pfil is there in between. > Yep, as stated. traffic between bge1 and nge1 on both systems was fine, only bge0 was affected. Since this I've also discovered this problem existed on some of our Solaris 9 systems that run similar ipf/pfil versions. ie: pfil_2.1.9 ip_fil4.1.13 but not in all combinations. eg: - Sun Fire V60x; no problems at all. Can't reproduce it on either e1000g0 or e1000g1. (Solaris 9/x86 2003/08 base with May 2005 recommended patch cluster) lockstat doesn't even show pfil_printmchain being called at all. - Sun Netra T1 105 sparc its 100% reproducable on both interfaces (hme0 and hme1). (Solaris 9 sparc 2003/12 base with May 2005 recommended patch cluster) lockstat shows vsnprintf and pfil_printmchain at the top of usage. Thoughput is abysmal; 300KB/sec. Kernel CPU usage 97%. - Sun Fire V100 sparc; no problems at all. Can't reproduce it on either dmfe0 or dmfe1. pfil_printmchain showed only a handful of calls in the trace. (identical OS/patch base as for the Netra) Tested two similar systems. Same results. Note that the ipf/pfil on the sparc systems were absolutely identical; installed from the same package I built. So what other factors can control whether pfil_printmchain is called? (couldn't spot anything in the code myself; and I hate an unsolved mystery like this as its probably related to another bug which could be way more serious) Ian D |