This is a discussion on need expert advice on configuring apache for a stats site within the Linux Web Servers forums, part of the Web Server and Related Forums category; Ok, to make long things short, i run a site offering (basic) web stats. People can include on their pages ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
Ok, to make long things short, i run a site offering (basic) web
stats. People can include on their pages a specific gif generated for their site and hosted on mine. I log access to those files in a specific log file, that is then analyzed to produce statistics. Some very popular sites with tons of daily page views are using this service. I have the current config: MaxKeepAliveRequests 100 KeepAliveTimeout 5 MinSpareServers 20 MaxSpareServers 40 MaxClients 256 MaxRequestsPerChild 60 I get approx 2 million requests on those gif files daily, which is not all that much overall, but those requests come from many different clients. Basically, someone connects to a site using the gif, opens a connection to my server, his browser (generally) caches the gif file, and i often don't even get another connection from this ip for another X hours (i generate stats on a daily basis so that's fine). The result is that the number of apache slots open is permanently quite high (around 240). I will probably need to recompile apache overriding the hardcoded 256 limit soon, but that's ok. Currently as i said the gif file is cached by the client. This allows me to generate unique IP stats, visitors origin etc., but doesn't allow me to generate pageviews statistics. In order to achieve this, I have installed mod_header and mod_expires, and have added directives so that the headers on the gif files have must-revalidate and no-cache. This would, in theory, make sure that any time a page including a gif is viewed, a request is sent to my server and logged, thus allowing me to generate page views stats. Unfortunately when i turned on this config, my server instantly saturated. The number of open slots reached 256 with (i suppose) an endless line in queue and my server was basicly inaccessible. Now here's my question, how should i go about handling such a situation ? Should I simply recompile with a higher max clients ? Having the keepalive timeout down to a lower value helped me in maintaining the number of open slots not too high so far, but that was in a state where clients were just downloading a single time the file, now that they would send a request for each page view, shouldn't I have it higher again ? The server also hosts a few websites, for which reason I cannot set the config only for those gif files (it must suit a standard website use as well), is there a way to have different keepalive values depending on what files are accessed (or to close a connection if it accessed a certain file) ? Or am I totally missing the point and should i do something completely differently ? Thanks for reading and thanks even more for any suggestions. Cheers, Quentin |
|
|||
|
Quentin said the following on 16/09/2004 11:03:
I'm no expert, but I have an opinion about this. <snip> > Currently as i said the gif file is cached by the client. This allows > me to generate unique IP stats, visitors origin etc., but doesn't > allow me to generate pageviews statistics. > In order to achieve this, I have installed mod_header and mod_expires, > and have added directives so that the headers on the gif files have > must-revalidate and no-cache. This would, in theory, make sure that > any time a page including a gif is viewed, a request is sent to my > server and logged, thus allowing me to generate page views stats. Please don't do this! Those gif files are *supposed* to be cached. Especially people with slow modems will not be thankful when they continually have to reload them. > Unfortunately when i turned on this config, my server instantly > saturated. The number of open slots reached 256 with (i suppose) an > endless line in queue and my server was basicly inaccessible. Yes, that might be a problem for you, but according to how I look at this, it's not a big problem. You mess with standard settings and thereby you decide for others what their browser should cache. > Now here's my question, how should i go about handling such a > situation ? Should I simply recompile with a higher max clients ? No, don't make users of the sites where you host the gif's for suffer. <snip> > connection if it accessed a certain file) ? Or am I totally missing > the point and should i do something completely differently ? Just find another solution to your problem, don't try to change how parts of the internet are supposed to work. Let them do the stat thing, or let them submit (parts of) their logs to you. Or maybe, but I don't like this either, you could use a 1x1 pixel gif which doesn't put much strain on other users bandwidth. But I doubt if it will solve your problem with the amount of connections. -- Regards Harrie |
|
|||
|
Harrie,
thanks for your response. Actually, i AM using 1*1 gifs that are less than 100 bytes. It does not make a difference to users to have 100 more bytes to download on each page when those pages are generally more like 30kB-50kB even without the images... Plus the gifs are generally put on the bottom of the page, which makes sure that the user can actually see the content even when it's getting downloaded. > Just find another solution to your problem, don't try to change how > parts of the internet are supposed to work. I'm sorry but i disagree with this. The internet is not "supposed to work" in a single way. Of yourse it's better to have regular images cached, but it makes a stats service much less useful if they are than if they are not. The standard defines the way to tell the browser *how* to cache images for some time *or* not to cache them, it doesn't tell that you *must* cache them. Harrie <dinges_danges_donges@hotmail.com> wrote in message news:<414b5f63$0$78753$e4fe514c@news.xs4all.nl>... > Quentin said the following on 16/09/2004 11:03: > > I'm no expert, but I have an opinion about this. > > <snip> > > Currently as i said the gif file is cached by the client. This allows > > me to generate unique IP stats, visitors origin etc., but doesn't > > allow me to generate pageviews statistics. > > In order to achieve this, I have installed mod_header and mod_expires, > > and have added directives so that the headers on the gif files have > > must-revalidate and no-cache. This would, in theory, make sure that > > any time a page including a gif is viewed, a request is sent to my > > server and logged, thus allowing me to generate page views stats. > > Please don't do this! Those gif files are *supposed* to be cached. > Especially people with slow modems will not be thankful when they > continually have to reload them. > > > Unfortunately when i turned on this config, my server instantly > > saturated. The number of open slots reached 256 with (i suppose) an > > endless line in queue and my server was basicly inaccessible. > > Yes, that might be a problem for you, but according to how I look at > this, it's not a big problem. You mess with standard settings and > thereby you decide for others what their browser should cache. > > > Now here's my question, how should i go about handling such a > > situation ? Should I simply recompile with a higher max clients ? > > No, don't make users of the sites where you host the gif's for suffer. > > <snip> > > connection if it accessed a certain file) ? Or am I totally missing > > the point and should i do something completely differently ? > > Just find another solution to your problem, don't try to change how > parts of the internet are supposed to work. > > Let them do the stat thing, or let them submit (parts of) their logs to you. > > Or maybe, but I don't like this either, you could use a 1x1 pixel gif > which doesn't put much strain on other users bandwidth. But I doubt if > it will solve your problem with the amount of connections. |
|
|||
|
Quentin said the following on 19/09/2004 22:14:
[snip] > Actually, i AM using 1*1 gifs that are less than 100 bytes. It does > not make a difference to users to have 100 more bytes to download on > each page when those pages are generally more like 30kB-50kB even That's not a fair comparison, since that 30-50 KB will be cached. If a user of those sites goed back to a page which was already visisted, the image would still have to be downloaded. But is a 1x1 gif without color 100 bytes? I don't know much about pictures, but it sounds still to much to me. > without the images... Plus the gifs are generally put on the bottom of > the page, which makes sure that the user can actually see the content > even when it's getting downloaded. But they still have to download it. It might not be much, but I still don't like it. >>Just find another solution to your problem, don't try to change how >>parts of the internet are supposed to work. > > I'm sorry but i disagree with this. The internet is not "supposed to > work" in a single way. Of yourse it's better to have regular images > cached, but it makes a stats service much less useful if they are than > if they are not. The standard defines the way to tell the browser > *how* to cache images for some time *or* not to cache them, it doesn't > tell that you *must* cache them. That's the problem with stat services. Like I said, the stats can be generated from their logs, there's no need for the image, except that you would be out off business. The standards certainly doesn't define that stat services should disable cache on pic's. [snip] -- Regards Harrie |
|
|||
|
> Now here's my question, how should i go about handling such a
> situation ? Should I simply recompile with a higher max clients ? Yes. By disabling the caching for this image and telling the client that they must hit your web server evertime you are increasing (dramatically) the number of connections required by your web server. > Having the keepalive timeout down to a lower value helped me in > maintaining the number of open slots not too high so far, but that was > in a state where clients were just downloading a single time the file, > now that they would send a request for each page view, shouldn't I > have it higher again ? Yeah, I wouldn't reduce the Keepalive for exactly the reason you mentioned: The client will be hitting your web server with every page. As you said, 100bytes is really small, not much to transfer. In fact, the packets required to create the TCP/IP connection will be bigger than the image itself thus more overhead to create the connection then to d/l the image. >The server also hosts a few websites, for which > reason I cannot set the config only for those gif files (it must suit > a standard website use as well), is there a way to have different > keepalive values depending on what files are accessed (or to close a > connection if it accessed a certain file) ? According to httpd.apache.org, the KeepAliveTimeout directive has a context of "server config, virtual host". So, the only way to set it more specifically than the whole server is to do it in a virtual host. If the other web sites you're hosting use different DNS names,IPs and/or ports than you can easily create virtual hosts and give them different KeepAlive values. > Or am I totally missing > the point and should i do something completely differently ? I do tend to agree with Harrie on this one. I provide a lot of stats for Intranet web servers at work. But you have to realise that web stats are very allusive and fuzzy. Here's a good read on that track: http://www.goldmark.org/netrants/webstats/ Somewhat synical, but makes some good points. I don't think web stats are totally useless, but don't expect them to be 100% accurate either; no matter what you do. -Norm |
|
|||
|
> > Now here's my question, how should i go about handling such a
> > situation ? Should I simply recompile with a higher max clients ? > > Yes. By disabling the caching for this image and telling the client > that they must hit your web server evertime you are increasing > (dramatically) the number of connections required by your web server. i'd say about an order of magnitude or two... > > Having the keepalive timeout down to a lower value helped me in > > maintaining the number of open slots not too high so far, but that was > > in a state where clients were just downloading a single time the file, > > now that they would send a request for each page view, shouldn't I > > have it higher again ? > > Yeah, I wouldn't reduce the Keepalive for exactly the reason you > mentioned: The client will be hitting your web server with every page. > As you said, 100bytes is really small, not much to transfer. In fact, > the packets required to create the TCP/IP connection will be bigger > than the image itself thus more overhead to create the connection then > to d/l the image. it depends on what the limiting factor is. your argument is valid network trafic -wise, but if the limiting factor is worker process number (or rather system memory which is equivalent), you want to have a very short keepalive in order to get the process free for the next customer instead of lingering idly around while waiting for followup requests from the last one. the best bet for the original poster would imho be an apache2 with a threaded mpm like worker, very high maxclient setting (threads don't eat up much mem cause they share a common image) and a two to three minute keepalive to minimise protocol overhead (now we can handle lots of idle worker threads since they don't eat much mem). joachim |