need expert advice on configuring apache for a stats site

This is a discussion on need expert advice on configuring apache for a stats site within the Linux Web Servers forums, part of the Web Server and Related Forums category; Ok, to make long things short, i run a site offering (basic) web stats. People can include on their pages ...


Go Back   Usenet Forums > Web Server and Related Forums > Linux Web Servers

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 09-16-2004
Quentin
 
Posts: n/a
Default need expert advice on configuring apache for a stats site

Ok, to make long things short, i run a site offering (basic) web
stats.
People can include on their pages a specific gif generated for their
site and hosted on mine. I log access to those files in a specific log
file, that is then analyzed to produce statistics.
Some very popular sites with tons of daily page views are using this
service.
I have the current config:

MaxKeepAliveRequests 100
KeepAliveTimeout 5
MinSpareServers 20
MaxSpareServers 40
MaxClients 256
MaxRequestsPerChild 60

I get approx 2 million requests on those gif files daily, which is not
all that much overall, but those requests come from many different
clients. Basically, someone connects to a site using the gif, opens a
connection to my server, his browser (generally) caches the gif file,
and i often don't even get another connection from this ip for another
X hours (i generate stats on a daily basis so that's fine).
The result is that the number of apache slots open is permanently
quite high (around 240). I will probably need to recompile apache
overriding the hardcoded 256 limit soon, but that's ok.

Currently as i said the gif file is cached by the client. This allows
me to generate unique IP stats, visitors origin etc., but doesn't
allow me to generate pageviews statistics.
In order to achieve this, I have installed mod_header and mod_expires,
and have added directives so that the headers on the gif files have
must-revalidate and no-cache. This would, in theory, make sure that
any time a page including a gif is viewed, a request is sent to my
server and logged, thus allowing me to generate page views stats.

Unfortunately when i turned on this config, my server instantly
saturated. The number of open slots reached 256 with (i suppose) an
endless line in queue and my server was basicly inaccessible.

Now here's my question, how should i go about handling such a
situation ? Should I simply recompile with a higher max clients ?
Having the keepalive timeout down to a lower value helped me in
maintaining the number of open slots not too high so far, but that was
in a state where clients were just downloading a single time the file,
now that they would send a request for each page view, shouldn't I
have it higher again ? The server also hosts a few websites, for which
reason I cannot set the config only for those gif files (it must suit
a standard website use as well), is there a way to have different
keepalive values depending on what files are accessed (or to close a
connection if it accessed a certain file) ? Or am I totally missing
the point and should i do something completely differently ?
Thanks for reading and thanks even more for any suggestions.

Cheers,
Quentin
  #2 (permalink)  
Old 09-17-2004
Harrie
 
Posts: n/a
Default Re: need expert advice on configuring apache for a stats site

Quentin said the following on 16/09/2004 11:03:

I'm no expert, but I have an opinion about this.

<snip>
> Currently as i said the gif file is cached by the client. This allows
> me to generate unique IP stats, visitors origin etc., but doesn't
> allow me to generate pageviews statistics.
> In order to achieve this, I have installed mod_header and mod_expires,
> and have added directives so that the headers on the gif files have
> must-revalidate and no-cache. This would, in theory, make sure that
> any time a page including a gif is viewed, a request is sent to my
> server and logged, thus allowing me to generate page views stats.


Please don't do this! Those gif files are *supposed* to be cached.
Especially people with slow modems will not be thankful when they
continually have to reload them.

> Unfortunately when i turned on this config, my server instantly
> saturated. The number of open slots reached 256 with (i suppose) an
> endless line in queue and my server was basicly inaccessible.


Yes, that might be a problem for you, but according to how I look at
this, it's not a big problem. You mess with standard settings and
thereby you decide for others what their browser should cache.

> Now here's my question, how should i go about handling such a
> situation ? Should I simply recompile with a higher max clients ?


No, don't make users of the sites where you host the gif's for suffer.

<snip>
> connection if it accessed a certain file) ? Or am I totally missing
> the point and should i do something completely differently ?


Just find another solution to your problem, don't try to change how
parts of the internet are supposed to work.

Let them do the stat thing, or let them submit (parts of) their logs to you.

Or maybe, but I don't like this either, you could use a 1x1 pixel gif
which doesn't put much strain on other users bandwidth. But I doubt if
it will solve your problem with the amount of connections.

--
Regards
Harrie
  #3 (permalink)  
Old 09-19-2004
Quentin
 
Posts: n/a
Default Re: need expert advice on configuring apache for a stats site

Harrie,
thanks for your response.
Actually, i AM using 1*1 gifs that are less than 100 bytes. It does
not make a difference to users to have 100 more bytes to download on
each page when those pages are generally more like 30kB-50kB even
without the images... Plus the gifs are generally put on the bottom of
the page, which makes sure that the user can actually see the content
even when it's getting downloaded.

> Just find another solution to your problem, don't try to change how
> parts of the internet are supposed to work.


I'm sorry but i disagree with this. The internet is not "supposed to
work" in a single way. Of yourse it's better to have regular images
cached, but it makes a stats service much less useful if they are than
if they are not. The standard defines the way to tell the browser
*how* to cache images for some time *or* not to cache them, it doesn't
tell that you *must* cache them.


Harrie <dinges_danges_donges@hotmail.com> wrote in message news:<414b5f63$0$78753$e4fe514c@news.xs4all.nl>...
> Quentin said the following on 16/09/2004 11:03:
>
> I'm no expert, but I have an opinion about this.
>
> <snip>
> > Currently as i said the gif file is cached by the client. This allows
> > me to generate unique IP stats, visitors origin etc., but doesn't
> > allow me to generate pageviews statistics.
> > In order to achieve this, I have installed mod_header and mod_expires,
> > and have added directives so that the headers on the gif files have
> > must-revalidate and no-cache. This would, in theory, make sure that
> > any time a page including a gif is viewed, a request is sent to my
> > server and logged, thus allowing me to generate page views stats.

>
> Please don't do this! Those gif files are *supposed* to be cached.
> Especially people with slow modems will not be thankful when they
> continually have to reload them.
>
> > Unfortunately when i turned on this config, my server instantly
> > saturated. The number of open slots reached 256 with (i suppose) an
> > endless line in queue and my server was basicly inaccessible.

>
> Yes, that might be a problem for you, but according to how I look at
> this, it's not a big problem. You mess with standard settings and
> thereby you decide for others what their browser should cache.
>
> > Now here's my question, how should i go about handling such a
> > situation ? Should I simply recompile with a higher max clients ?

>
> No, don't make users of the sites where you host the gif's for suffer.
>
> <snip>
> > connection if it accessed a certain file) ? Or am I totally missing
> > the point and should i do something completely differently ?

>
> Just find another solution to your problem, don't try to change how
> parts of the internet are supposed to work.
>
> Let them do the stat thing, or let them submit (parts of) their logs to you.
>
> Or maybe, but I don't like this either, you could use a 1x1 pixel gif
> which doesn't put much strain on other users bandwidth. But I doubt if
> it will solve your problem with the amount of connections.

  #4 (permalink)  
Old 09-19-2004
Harrie
 
Posts: n/a
Default Re: need expert advice on configuring apache for a stats site

Quentin said the following on 19/09/2004 22:14:

[snip]
> Actually, i AM using 1*1 gifs that are less than 100 bytes. It does
> not make a difference to users to have 100 more bytes to download on
> each page when those pages are generally more like 30kB-50kB even


That's not a fair comparison, since that 30-50 KB will be cached. If a
user of those sites goed back to a page which was already visisted, the
image would still have to be downloaded.

But is a 1x1 gif without color 100 bytes? I don't know much about
pictures, but it sounds still to much to me.

> without the images... Plus the gifs are generally put on the bottom of
> the page, which makes sure that the user can actually see the content
> even when it's getting downloaded.


But they still have to download it. It might not be much, but I still
don't like it.

>>Just find another solution to your problem, don't try to change how
>>parts of the internet are supposed to work.

>
> I'm sorry but i disagree with this. The internet is not "supposed to
> work" in a single way. Of yourse it's better to have regular images
> cached, but it makes a stats service much less useful if they are than
> if they are not. The standard defines the way to tell the browser
> *how* to cache images for some time *or* not to cache them, it doesn't
> tell that you *must* cache them.


That's the problem with stat services. Like I said, the stats can be
generated from their logs, there's no need for the image, except that
you would be out off business. The standards certainly doesn't define
that stat services should disable cache on pic's.

[snip]

--
Regards
Harrie
  #5 (permalink)  
Old 09-20-2004
Norman Ackroyd
 
Posts: n/a
Default Re: need expert advice on configuring apache for a stats site

> Now here's my question, how should i go about handling such a
> situation ? Should I simply recompile with a higher max clients ?


Yes. By disabling the caching for this image and telling the client
that they must hit your web server evertime you are increasing
(dramatically) the number of connections required by your web server.

> Having the keepalive timeout down to a lower value helped me in
> maintaining the number of open slots not too high so far, but that was
> in a state where clients were just downloading a single time the file,
> now that they would send a request for each page view, shouldn't I
> have it higher again ?


Yeah, I wouldn't reduce the Keepalive for exactly the reason you
mentioned: The client will be hitting your web server with every page.
As you said, 100bytes is really small, not much to transfer. In fact,
the packets required to create the TCP/IP connection will be bigger
than the image itself thus more overhead to create the connection then
to d/l the image.

>The server also hosts a few websites, for which
> reason I cannot set the config only for those gif files (it must suit
> a standard website use as well), is there a way to have different
> keepalive values depending on what files are accessed (or to close a
> connection if it accessed a certain file) ?


According to httpd.apache.org, the KeepAliveTimeout directive has a
context of "server config, virtual host". So, the only way to set it
more specifically than the whole server is to do it in a virtual host.
If the other web sites you're hosting use different DNS names,IPs
and/or ports than you can easily create virtual hosts and give them
different KeepAlive values.

> Or am I totally missing
> the point and should i do something completely differently ?


I do tend to agree with Harrie on this one. I provide a lot of stats
for Intranet web servers at work. But you have to realise that web
stats are very allusive and fuzzy. Here's a good read on that track:

http://www.goldmark.org/netrants/webstats/

Somewhat synical, but makes some good points. I don't think web stats
are totally useless, but don't expect them to be 100% accurate either;
no matter what you do.

-Norm
  #6 (permalink)  
Old 09-22-2004
Joachim Ring
 
Posts: n/a
Default Re: need expert advice on configuring apache for a stats site

> > Now here's my question, how should i go about handling such a
> > situation ? Should I simply recompile with a higher max clients ?

>
> Yes. By disabling the caching for this image and telling the client
> that they must hit your web server evertime you are increasing
> (dramatically) the number of connections required by your web server.


i'd say about an order of magnitude or two...

> > Having the keepalive timeout down to a lower value helped me in
> > maintaining the number of open slots not too high so far, but that was
> > in a state where clients were just downloading a single time the file,
> > now that they would send a request for each page view, shouldn't I
> > have it higher again ?

>
> Yeah, I wouldn't reduce the Keepalive for exactly the reason you
> mentioned: The client will be hitting your web server with every page.
> As you said, 100bytes is really small, not much to transfer. In fact,
> the packets required to create the TCP/IP connection will be bigger
> than the image itself thus more overhead to create the connection then
> to d/l the image.


it depends on what the limiting factor is. your argument is valid
network trafic -wise, but if the limiting factor is worker process
number (or rather system memory which is equivalent), you want to have
a very short keepalive in order to get the process free for the next
customer instead of lingering idly around while waiting for followup
requests from the last one.

the best bet for the original poster would imho be an apache2 with a
threaded mpm like worker, very high maxclient setting (threads don't
eat up much mem cause they share a common image) and a two to three
minute keepalive to minimise protocol overhead (now we can handle lots
of idle worker threads since they don't eat much mem).

joachim
 
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT +1. The time now is 02:50 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.0.0