This is a discussion on Re: apache on linux claims "Too many open files" within the Linux Web Servers forums, part of the Web Server and Related Forums category; > Is there any kind of testing I can do to narrow down the problem? I'm > still at ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
> Is there any kind of testing I can do to narrow down the problem? I'm
> still at a loss. one obvious approach would be to see wether you can stress test one of your other machines into showing the same problems which would allow you to experiment without jeopardizing production. you might use ab on the most common non-trivial file on your site for starters. if that doesn't help you might try to model a test workload after your real access-stats. also you might want to have some scripts in place to collect data from production over time. maybe fs.file-nr, fs.inode-nr and # of apache processes running. this might help to understand the problem when looking at the conditions when the errors occur. that being said i think the stats you got for inode-nr are completely fubar. the docs state nr_inodes (total inodes allocated) and nr_free_inodes (total inodes free) as the two values given by fs/inode-nr and the formula to calculate used inodes as: nr_used_inodes = nr_inodes - nr_free_inodes unfortunately this means this number is negative on your machine for both sets of data, which is rather improbable :-( also the fact that at the time with the problem you are supposed to have more free inodes than when all is fine is a bit confusing. some googling reavealed that there has been indeed a known kernel bug which lead to a negative value in inode-nr but this was for the second value nr_free_inodes and it was fixed in 2.4.5 or so and therefore should not be present in 2.4.9 http://groups.google.de/groups?q=fs....psu.edu&rnum=4 but just in case could you compare the values for /proc/sys/fs/inode-state and from sysctl output to make shure they're not reversed and we see that problem here. well, it's quite obscure - if i were you i'd get a new kernal (latest from redhat or vanilla tarball from kernel.org - in that order), install it on the testsystem, see wether it still works in principle and then see wether problem goes away. joachim |