This is a discussion on preventing DOS when serving up large files within the Apache Web Server forums, part of the Web Server and Related Forums category; I have a server that has some large PDF files on it (up to 15 Mb). I make the files ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
I have a server that has some large PDF files on it (up to 15 Mb). I
make the files available in smaller, 50-page chunks, which seems to be more convenient for most users, but some people really do want an entire book as one huge PDF file. This generally hasn't been a problem over the last few years. However, last night I found my server dead in the water, not responding to my http requests, and just barely responding to me when I ssh'd in. The log file looked like this: 59.78.2.1 - - [21/Jan/2007:19:35:37 +0000] "GET /bk1.pdf HTTP/1.1" 200 40960 59.78.2.1 - - [21/Jan/2007:19:35:38 +0000] "GET /bk1.pdf HTTP/1.1" 200 32768 59.78.2.1 - - [21/Jan/2007:19:35:40 +0000] "GET /bk4.pdf HTTP/1.1" 206 139264 59.78.2.1 - - [21/Jan/2007:19:35:41 +0000] "GET /bk3.pdf HTTP/1.1" 206 40960 59.78.2.1 - - [21/Jan/2007:19:35:42 +0000] "GET /bk2.pdf HTTP/1.1" 200 40960 59.78.2.1 - - [21/Jan/2007:19:35:44 +0000] "GET /bk2.pdf HTTP/1.1" 200 32768 59.78.2.1 - - [21/Jan/2007:19:35:45 +0000] "GET /bk2.pdf HTTP/1.1" 200 40960 59.78.2.1 - - [21/Jan/2007:19:35:46 +0000] "GET /bk2.pdf HTTP/1.1" 200 32768 59.78.2.1 - - [21/Jan/2007:19:35:47 +0000] "GET /bk2.pdf HTTP/1.1" 200 32768 I had about 200 apache child processes running. (MaxClients is set to 150, but I guess apache doesn't feel too constrained by that?) I'm running Apache 1.3. I'm not sure if this was actually a DOS attack, or just someone's poorly written bot. I have mod_evasive installed, and normally it seems to work well, but in this case it didn't seem to kick in; /var/log/messages shows the IP being blacklisted, but only after I had actually worked around the attack by denying access to the IP in my httpd.conf. Maybe there is something in mod_evasive's algorithm that makes it not trigger on this particular situation? Here is the relevant part of my config: <IfModule mod_evasive.c> DOSHashTableSize 3097 DOSPageCount 2 DOSSiteCount 50 DOSPageInterval 1 DOSSiteInterval 1 DOSBlockingPeriod 10 </IfModule> (After I started sending back 403 responses to this IP, their script kept pounding away with the same request, until I finally got a chance today to ask mywebhost to block it at the router.) Is there anything I can do that will make my apache configuration deal more gracefully, in a fully automated way, with this situation? AFAICT, the problem was that apache had as many child processes going as it was willing to run, and since all of those were occupied with responding to this script kiddie, it wasn't able to respond to other requests. I imagine that raising MaxClients won't help, since one user could still start enough processes to max me out. I could use mod_bandwidth, but that doesn't seem like it would help either, since their script doesn't actually seem to have been sucking down any more packets after receiving the first one. TIA! |
|
|||
|
Ben Crowell wrote:
> 59.78.2.1 - - [21/Jan/2007:19:35:37 +0000] "GET /bk1.pdf HTTP/1.1" 200 40960 > 59.78.2.1 - - [21/Jan/2007:19:35:38 +0000] "GET /bk1.pdf HTTP/1.1" 200 32768 > 59.78.2.1 - - [21/Jan/2007:19:35:40 +0000] "GET /bk4.pdf HTTP/1.1" 206 139264 > 59.78.2.1 - - [21/Jan/2007:19:35:41 +0000] "GET /bk3.pdf HTTP/1.1" 206 40960 > 59.78.2.1 - - [21/Jan/2007:19:35:42 +0000] "GET /bk2.pdf HTTP/1.1" 200 40960 > 59.78.2.1 - - [21/Jan/2007:19:35:44 +0000] "GET /bk2.pdf HTTP/1.1" 200 32768 > 59.78.2.1 - - [21/Jan/2007:19:35:45 +0000] "GET /bk2.pdf HTTP/1.1" 200 40960 > 59.78.2.1 - - [21/Jan/2007:19:35:46 +0000] "GET /bk2.pdf HTTP/1.1" 200 32768 > 59.78.2.1 - - [21/Jan/2007:19:35:47 +0000] "GET /bk2.pdf HTTP/1.1" 200 32768 Average one hit per second, same page request average, one hit per second. > I had about 200 apache child processes running. (MaxClients is set to > 150, but I guess apache doesn't feel too constrained by that?) I'm > running Apache 1.3. > I'm not sure if this was actually a DOS attack, or just someone's poorly > DOSHashTableSize 3097 > DOSPageCount 2 > DOSSiteCount 50 > DOSPageInterval 1 > DOSSiteInterval 1 > DOSBlockingPeriod 10 > </IfModule> Child processes are your true problem. These forked processes consume CPU time and RAM memory. There are some bug reports on child processes exceeding maximum limit, but this problem appears to have been patched out; not a lot of reports on this problem for more recent 1.3.x versions. You need to hard verify your Apache is not obeying MaxClients. This is usually not a problem. Reduce your Keep Alive Timeout to five (5) seconds. Reduce your Max Keep Alive Requests according to actual need, maybe 100 Reduce your Max Requests Per Child to one-half of your current setting. Reduce your Timeout to one-half of your current setting. Set your MaxClients experimentally, 120 to 360, discover what happens. Challenge here is those settings need to be fine tuned to your average load demand. An example is if many of your static html pages contain a lot of graphics, your keep alive might need to be set higher to allow ample time for loading all graphics. There are no average settings, default settings, which can be applied to all servers. You must consider your average load, then fine tune your settings to handle your load without dropped connections or other problems. Determine the minimum settings you can use just before problems begin, then increase all settings ten percent of current numerical value. On Dos Evasive, looking at your sample log record, this boy from China did not trigger your evasive module; he was within limits. These would be _very_ tight settings, DOSPageCount 1 DOSSiteCount 5 DOSPageInterval 1 DOSSiteInterval 1 One page per second, five connections per second. You can experiment, discover results. Some spider bots might be knocked out, possibly some clients with high speed broadband will be knocked out. You can set, DOSBlockingPeriod 1 This would help to stop blocking of innocent clients for a long time. Same challenge here. Your evasive settings need to be fine tuned according to your average server load. Low load, tight settings. High load, generous settings. Too tight of settings can cause more problems than a DOS attack; be careful. There are a number of "stress tester" software out there, for free. Software like this will hit your server hard with requests so you can observe how much stress your server can handle. Stress testers will allow you to test your settings in a short period of time, maybe late at night, or on a Sunday. You can test at a time your server load is minimal; less disruptive to clients. A neat trick on this is to write multiple httpd.conf files, each with different settings, ranging from tight to generous. Rather than editing the same conf file, you simply rename your conf file in use, then rename a new conf file to test. Perform a hard restart or a soft restart to load the new conf file. You are simply plugging in a series of conf files for testing; quick and easy. As to the boy in China, personally, I would .htaccess block him or block him in your conf file, for good; never allow the server access. This solves your problem instantly. Odds are not many over in China are visiting your site; no harm done. Servers out of China represent a large percentage of problem servers. My habit is simply to block access and not worry about this. Your absolute best cure for DOS attacks, which are infrequent these days, is a firmware firewall. A hardware firewall between your server and the internet is the best method and so easy to use. Block the China boy and stop worrying! Kick him out, leave your server alone. Purl Gurl |
|
|||
|
Ben Crowell wrote:
> Wow, you really put a lot of time and work into that reply -- > thanks! De nada. Readers benefit more when ample discussion is afforded. My experience is a large majority of Apache problems are attributed to user configuration. Apache is relatively bug free, well, sorta. Best and most stable versions of Apache are 1.3.2x through 1.3.26 versions. Version 1.3.27 and up, have a handful of very serious bugs. The 2.x versions are simply too buggy to be trusted. Those versions contain too many whistles and bells which create bugs. These 2.x versions are large, cumbersome, slow and would contribute to this problem you note; excess CPU and RAM usage. I have not tested the most recent version of the 2.x series. Might be most bugs have been patched out. Many people are happy with those versions. Readers should note I am highly biased to sleek, trim and efficient programming. I really do not like whistles and bells; all business, no nonsense. I must stress a point; almost all Apache problems are contributed to user configuration. Returning to your DOS Evasive module, I am not so sure you really need this module, these days. DOS attacks still happen but are becoming rare. Years back, we read about DOS attacks almost daily, back when these attacks were popular amongst pimple faced teenage idiot boys. Law enforcement actions and improvements in software, have fairly much eliminated this popularity of DOS attacks. My personal opinion is best defense is to run Apache as efficient as possible. This is to turn off all modules save for bare bones. Another defense is to have a modern machine, one with a gigaHertz or better CPU speed and a couple of gigabytes of memory. A good machine will handle circumstances like you experienced, which was not a DOS attack, just "something" really stupid. Block the server and return a 403 forbidden message. This is a very efficient method. In a month or two, remove the block and discover if the problem server has given up on your server. The most noted problem I observe these days is email spammers looking for proxy servers, and Chinese looking for a proxy server. Even so, this is not a major problem. Email spammers, I block. Searches for proxy servers from China, I allow because of social concerns. Chinese looking for proxy servers are simply trying to escape government restrictions on internet access. I cannot fault the Chinese for this. There is a number one major problem; spam email. Estimates are ninety to ninety-five percent of all email is spam. Our email server is clobbered by spammers hundreds of times per day. Keeping ahead of email spammers is a very serious problem. However, this is not an Apache problem. This returns to my comment about a firmware firewall. This is the best method; plugin firewall between your servers and the internet. You can buy a good used firewall through Ebay for a decent price. Apache is not written to be a firewall. Adding modules is ok, but there are limitations, as you discovered. A hardware firewall is dedicated to this task of just such; being a firewall. Years back I used our old Netscreen to prevent a variety of attacks, including DOS attacks. This is not a problem today so I removed our firewall. Now our firewall is back into the system to block email spamming servers. This is a big problem today. Advantage here is a firmware firewall will block attacks and servers before entering your local system. Apache never "sees" those problems, your email and dns servers never "see" those problems. All "bad" traffic is halted at the firewall allowing your servers to run smoothly and efficiently. A quick and very good cure for almost all common problems related to viruses, trojans and such, is simply to install an inexpensive router, such as an older Linksys. Does not matter if you have one computer or four computers. A router works wonders for preventing a lot of problems, especially by being able to block sensitive incoming "port" connections. You can buy a brand new discontinued Linksys BEFSR41 router for under twenty-five bucks through Ebay. Works great. For your circumstances, I would trim down Apache to bare bones, use tight settings, block problem servers, and let it go at that. Then look at adding a firewall or, at least, adding a router. Purl Gurl |
|
|||
|
Purl Gurl wrote:
<snip> > > Block the server and return a 403 forbidden message. This is > a very efficient method. In a month or two, remove the block > and discover if the problem server has given up on your server. > For business reasons, we went to a simply 403 error page that explains that the requestor's IP is blocked and provide a link to send an email if they would like it removed. It amazes me how few IPs that I block ever result in a request to remove the block. I track the blocks I remove so if I block it again I can note it is the second time. Getting out of that block is *much* harder. <snip> Jim |
|
|||
|
Jim Hayter wrote:
> For business reasons, we went to a simply 403 error page that explains > that the requestor's IP is blocked and provide a link to send an email > if they would like it removed. It amazes me how few IPs that I block > ever result in a request to remove the block. I'm guessing that a lot of these are spammers who write scripts to search for e-mail addresses, blogs and wikis to spam, etc. The spammer doesn't intend it to be a DOS, but the script is written in a clueless way that has that effect. He's probably running it from a machine with a temporary DHCP-assigned address, and the machine may be a zombie. The next person to be assigned the IP is unlikely to hit your site, so he won't notice the block, and even if he did, he wouldn't think to complain to his ISP. |
|
|||
|
On 23 Jan, 00:48, Ben Crowell <crowel...@lightSPAMandISmatterEVIL.com> wrote: > I have a server that has some large PDF files on it (up to 15 Mb). I > make the files available in smaller, 50-page chunks, which seems to be > more convenient for most users, but some people really do want an entire > book as one huge PDF file. This generally hasn't been a problem over the > last few years. However, last night I found my server dead in the water, > not responding to my http requests, and just barely responding to me > when I ssh'd in. The log file looked like this: > > 59.78.2.1 - - [21/Jan/2007:19:35:37 +0000] "GET /bk1.pdf HTTP/1.1" 200 40960 > 59.78.2.1 - - [21/Jan/2007:19:35:38 +0000] "GET /bk1.pdf HTTP/1.1" 200 32768 > 59.78.2.1 - - [21/Jan/2007:19:35:40 +0000] "GET /bk4.pdf HTTP/1.1" 206 > 139264 > 59.78.2.1 - - [21/Jan/2007:19:35:41 +0000] "GET /bk3.pdf HTTP/1.1" 206 40960 > 59.78.2.1 - - [21/Jan/2007:19:35:42 +0000] "GET /bk2.pdf HTTP/1.1" 200 40960 > 59.78.2.1 - - [21/Jan/2007:19:35:44 +0000] "GET /bk2.pdf HTTP/1.1" 200 32768 > 59.78.2.1 - - [21/Jan/2007:19:35:45 +0000] "GET /bk2.pdf HTTP/1.1" 200 40960 > 59.78.2.1 - - [21/Jan/2007:19:35:46 +0000] "GET /bk2.pdf HTTP/1.1" 200 32768 > 59.78.2.1 - - [21/Jan/2007:19:35:47 +0000] "GET /bk2.pdf HTTP/1.1" 200 32768 > > I had about 200 apache child processes running. (MaxClients is set to > 150, but I guess apache doesn't feel too constrained by that?) I'm > running Apache 1.3. > > I'm not sure if this was actually a DOS attack, or just someone's poorly > written bot. I have mod_evasive installed, and normally it seems to work > well, but in this case it didn't seem to kick in; /var/log/messages > shows the IP being blacklisted, but only after I had actually worked > around the attack by denying access to the IP in my httpd.conf. Maybe > there is something in mod_evasive's algorithm that makes it not trigger > on this particular situation? Here is the relevant part of my config: > <IfModule mod_evasive.c> > DOSHashTableSize 3097 > DOSPageCount 2 > DOSSiteCount 50 > DOSPageInterval 1 > DOSSiteInterval 1 > DOSBlockingPeriod 10 > </IfModule> > (After I started sending back 403 responses to this IP, their > script kept pounding away with the same request, until I finally > got a chance today to ask mywebhost to block it at the router.) > > Is there anything I can do that will make my apache configuration > deal more gracefully, in a fully automated way, with this situation? > AFAICT, the problem was that apache had as many child processes going > as it was willing to run, and since all of those were occupied with > responding to this script kiddie, it wasn't able to respond to other > requests. I imagine that raising MaxClients won't help, since one user > could still start enough processes to max me out. I could use > mod_bandwidth, but that doesn't seem like it would help either, since > their script doesn't actually seem to have been sucking down any more > packets after receiving the first one. > > TIA! A few simple things you can do for this kind of bot, is a) use a 2nd computer somewhere to keep an eye on your apache server for you, tailing the last few lines of the access. and error logs and emailing you if the initial connection time goes beyond a threshold. b) (In that email provide an html link) to a script that is able to modify your .htaccess file (if you have that turned on) to block specific problem IPs if you don't like what the requests look like, set the script to only allow connections from trusted IPs or to have a simple login. Consider as was previously said a more personal message allowing the user to get unblocked (if they are human) you can then use the same email->link->.htaccess modifier to unblock them c) Have a .htaccess rewrite set up for all files over a certain size that you can switch on and off with a script which rewrites those files to a content distribution network like CORAL. This prevents slashdotting and other DOS type problems. d) If you would prefer to keep a tighter control over simultaneous connection limits, consider using a server side script as a gateway for those files, if the connection limit is exceeded, the script sends 206 headers back instead of data. There are also Apache 1.3.x modules that can help with this. The names of which I unhelpfully temporarily forget (mod_bw mod bandwisth perhaps: http://www.cohprog.com/v3/bandwidth/doc-en.html) for your specific problem with large PDF's, have you considered linearizing them, so that they can load page by page. You can then allow users to view any page they wish simply by making links of the form: http://server.com/path_to/file.pdf#page=386 (from what I remember it work when opening a local filesystem pdf in the url of a browser though) This requires you know what page they want, or allow searching by your users. You would burst and dump [tools available from http://www.pdfhacks.com] the uncompressed contents of the PDF into a database, page by page, or flat files of course, you can then allow searching for keywords and generate a list of best match pages with links to those pdfs. Then I guess an auto overflow DIV at the top of the page with the results, followed by an embedded PDF so the user can jump from result to result using the top scrolling DIV to control the PDF. This works because Apache can serve 206 partial content, and because from (approximately!!) around PDF1.3 2003/4 the leading PDF view supports this functionality, even from within the pdf itself. |
|
|||
|
On 24 Jan, 05:26, "shimmyshack" <matt.fa...@gmail.com> wrote: > On 23 Jan, 00:48, Ben Crowell <crowel...@lightSPAMandISmatterEVIL.com> > wrote: > > > > > I have a server that has some large PDF files on it (up to 15 Mb). I > > make the files available in smaller, 50-page chunks, which seems to be > > more convenient for most users, but some people really do want an entire > > book as one huge PDF file. This generally hasn't been a problem over the > > last few years. However, last night I found my server dead in the water, > > not responding to my http requests, and just barely responding to me > > when I ssh'd in. The log file looked like this: > > > 59.78.2.1 - - [21/Jan/2007:19:35:37 +0000] "GET /bk1.pdf HTTP/1.1" 200 40960 > > 59.78.2.1 - - [21/Jan/2007:19:35:38 +0000] "GET /bk1.pdf HTTP/1.1" 200 32768 > > 59.78.2.1 - - [21/Jan/2007:19:35:40 +0000] "GET /bk4.pdf HTTP/1.1" 206 > > 139264 > > 59.78.2.1 - - [21/Jan/2007:19:35:41 +0000] "GET /bk3.pdf HTTP/1.1" 206 40960 > > 59.78.2.1 - - [21/Jan/2007:19:35:42 +0000] "GET /bk2.pdf HTTP/1.1" 200 40960 > > 59.78.2.1 - - [21/Jan/2007:19:35:44 +0000] "GET /bk2.pdf HTTP/1.1" 200 32768 > > 59.78.2.1 - - [21/Jan/2007:19:35:45 +0000] "GET /bk2.pdf HTTP/1.1" 200 40960 > > 59.78.2.1 - - [21/Jan/2007:19:35:46 +0000] "GET /bk2.pdf HTTP/1.1" 200 32768 > > 59.78.2.1 - - [21/Jan/2007:19:35:47 +0000] "GET /bk2.pdf HTTP/1.1" 200 32768 > > > I had about 200 apache child processes running. (MaxClients is set to > > 150, but I guess apache doesn't feel too constrained by that?) I'm > > running Apache 1.3. > > > I'm not sure if this was actually a DOS attack, or just someone's poorly > > written bot. I have mod_evasive installed, and normally it seems to work > > well, but in this case it didn't seem to kick in; /var/log/messages > > shows the IP being blacklisted, but only after I had actually worked > > around the attack by denying access to the IP in my httpd.conf. Maybe > > there is something in mod_evasive's algorithm that makes it not trigger > > on this particular situation? Here is the relevant part of my config: > > <IfModule mod_evasive.c> > > DOSHashTableSize 3097 > > DOSPageCount 2 > > DOSSiteCount 50 > > DOSPageInterval 1 > > DOSSiteInterval 1 > > DOSBlockingPeriod 10 > > </IfModule> > > (After I started sending back 403 responses to this IP, their > > script kept pounding away with the same request, until I finally > > got a chance today to ask mywebhost to block it at the router.) > > > Is there anything I can do that will make my apache configuration > > deal more gracefully, in a fully automated way, with this situation? > > AFAICT, the problem was that apache had as many child processes going > > as it was willing to run, and since all of those were occupied with > > responding to this script kiddie, it wasn't able to respond to other > > requests. I imagine that raising MaxClients won't help, since one user > > could still start enough processes to max me out. I could use > > mod_bandwidth, but that doesn't seem like it would help either, since > > their script doesn't actually seem to have been sucking down any more > > packets after receiving the first one. > > > TIA!A few simple things you can do for this kind of bot, is > a) use a 2nd computer somewhere to keep an eye on your apache server > for you, tailing the last few lines of the access. and error logs and > emailing you if the initial connection time goes beyond a threshold. > b) (In that email provide an html link) to a script that is able to > modify your .htaccess file (if you have that turned on) to block > specific problem IPs if you don't like what the requests look like, set > the script to only allow connections from trusted IPs or to have a > simple login. Consider as was previously said a more personal message > allowing the user to get unblocked (if they are human) you can then use > the same email->link->.htaccess modifier to unblock them > c) Have a .htaccess rewrite set up for all files over a certain size > that you can switch on and off with a script which rewrites those files > to a content distribution network like CORAL. This prevents > slashdotting and other DOS type problems. > d) If you would prefer to keep a tighter control over simultaneous > connection limits, consider using a server side script as a gateway for > those files, if the connection limit is exceeded, the script sends 206 > headers back instead of data. There are also Apache 1.3.x modules that > can help with this. The names of which I unhelpfully temporarily forget > (mod_bw mod bandwisth perhaps:http://www.cohprog.com/v3/bandwidth/doc-en.html) > > for your specific problem with large PDF's, have you considered > linearizing them, so that they can load page by page. You can then > allow users to view any page they wish simply by making links of the > form:http://server.com/path_to/file.pdf#page=386 > (from what I remember it work when opening a local filesystem pdf in > the url of a browser though) > This requires you know what page they want, or allow searching by your > users. You would burst and dump [tools available fromhttp://www.pdfhacks.com] the uncompressed contents of the PDF into a > database, page by page, or flat files of course, you can then allow > searching for keywords and generate a list of best match pages with > links to those pdfs. Then I guess an auto overflow DIV at the top of > the page with the results, followed by an embedded PDF so the user can > jump from result to result using the top scrolling DIV to control the > PDF. This works because Apache can serve 206 partial content, and > because from (approximately!!) around PDF1.3 2003/4 the leading PDF > view supports this functionality, even from within the pdf itself. sorry: should have been http://server.com/path_to/file.pdf#page=386 (from what I remember it DOES NOT work when opening a local filesystem pdf in the url of a browser though) don't you just hate that |
|
|||
|
shimmyshack wrote:
> b) (In that email provide an html link) to a script that is able to > modify your .htaccess file (if you have that turned on) to block > specific problem IPs if you don't like what the requests look like, set > the script to only allow connections from trusted IPs or to have a > simple login. Consider as was previously said a more personal message > allowing the user to get unblocked (if they are human) you can then use > the same email->link->.htaccess modifier to unblock them Aha! This sounds like pretty much what I want to do. I had been thinking of doing something like this, but hadn't realized that .htaccess was a way to do it. Thanks! |