This is a discussion on mod_rewrite and Percent Signs within the Apache Web Server forums, part of the Web Server and Related Forums category; I'm seeing hits in my logs: /some_url.htm%3Fsource%3Dthe_source which if it were in normal format would be: /...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
I'm seeing hits in my logs:
/some_url.htm%3Fsource%3Dthe_source which if it were in normal format would be: /some_url.htm?source=the_source I really think these are harvesters looking for addresses. Anyhow, I'd like to re-write the URL's so that they redirect to a page. I've tried just about every variation I could possibly think of on: RewriteMatch temp .*%3Fsource.3D.*$ some_other_page.html I've also tried: RewriteMatch temp 3Fsource some_other_page.html and it doesn't work. The last one will work for: /some3FsourceOtherPage.html but not /somepage.html%3Fsource%3dxxxxx I'm guessing there's some magic about the % sign. Can anyone give me an idea on how to handle this? |
|
|||
|
George Sexton wrote:
> I'm seeing hits in my logs: > > /some_url.htm%3Fsource%3Dthe_source > > which if it were in normal format would be: > > /some_url.htm?source=the_source > > I really think these are harvesters looking for addresses. Anyhow, I'd > like to re-write the URL's so that they redirect to a page. I've tried > just about every variation I could possibly think of on: > > RewriteMatch temp .*%3Fsource.3D.*$ some_other_page.html Dumb. I meant RedirectMatch > > I've also tried: > > RewriteMatch temp 3Fsource some_other_page.html > > and it doesn't work. > > The last one will work for: > > /some3FsourceOtherPage.html > > but not > > /somepage.html%3Fsource%3dxxxxx > > > I'm guessing there's some magic about the % sign. Can anyone give me an > idea on how to handle this? |
|
|||
|
"George Sexton" <gsexton@mhsoftware.com> wrote in message
news:ddGdnZPkPs72xt3VnZ2dnUVZ_s7inZ2d@comcast.com. .. > I'm seeing hits in my logs: > > /some_url.htm%3Fsource%3Dthe_source > > which if it were in normal format would be: > > /some_url.htm?source=the_source > > I really think these are harvesters looking for addresses. Anyhow, I'd > like to re-write the URL's so that they redirect to a page. I've tried > just about every variation I could possibly think of on: > > RewriteMatch temp .*%3Fsource.3D.*$ some_other_page.html > > I've also tried: > > RewriteMatch temp 3Fsource some_other_page.html > > and it doesn't work. > > The last one will work for: > > /some3FsourceOtherPage.html > > but not > > /somepage.html%3Fsource%3dxxxxx > > > I'm guessing there's some magic about the % sign. Can anyone give me an > idea on how to handle this? http://httpd.apache.org/docs/2.2/mod...e.html#quoting |
|
|||
|
phantom wrote:
> "George Sexton" <gsexton@mhsoftware.com> wrote in message > news:ddGdnZPkPs72xt3VnZ2dnUVZ_s7inZ2d@comcast.com. .. >> >> I'm guessing there's some magic about the % sign. Can anyone give me an >> idea on how to handle this? > > http://httpd.apache.org/docs/2.2/mod...e.html#quoting > > First thing I tried. Doesn't work. |
|
|||
|
"George Sexton" <gsexton@mhsoftware.com> wrote in message
news:oNednQfh5KDsEN3VnZ2dnUVZ_rLinZ2d@comcast.com. .. > phantom wrote: >> "George Sexton" <gsexton@mhsoftware.com> wrote in message >> news:ddGdnZPkPs72xt3VnZ2dnUVZ_s7inZ2d@comcast.com. .. > >>> >>> I'm guessing there's some magic about the % sign. Can anyone give me an >>> idea on how to handle this? >> >> http://httpd.apache.org/docs/2.2/mod...e.html#quoting >> >> > > First thing I tried. Doesn't work. oops, apache unescapes the url before it processes the rewrites: RedirectMatch temp .*\?source=.* some_other_page.html should work. |
|
|||
|
phantom wrote:
> "George Sexton" <gsexton@mhsoftware.com> wrote in message > news:oNednQfh5KDsEN3VnZ2dnUVZ_rLinZ2d@comcast.com. .. >> phantom wrote: >>> "George Sexton" <gsexton@mhsoftware.com> wrote in message >>> news:ddGdnZPkPs72xt3VnZ2dnUVZ_s7inZ2d@comcast.com. .. >>>> I'm guessing there's some magic about the % sign. Can anyone give me an >>>> idea on how to handle this? >>> http://httpd.apache.org/docs/2.2/mod...e.html#quoting >>> >>> >> First thing I tried. Doesn't work. > > oops, apache unescapes the url before it processes the rewrites: Why does it unescape the URL before processing Rewrites, but not the rest of the time? IOW, why does some_url.html%3Fsource%3Dabc generate a 404? It's maddening. I suppose I should turn this in as a bug. > > RedirectMatch temp .*\?source=.* some_other_page.html > > should work. That's the problem. That is a valid request that I don't want molested. > > |
|
|||
|
"George Sexton" <gsexton@mhsoftware.com> wrote in message
news:lqKdnUeMAKV9_NzVnZ2dnUVZ_uadnZ2d@comcast.com. .. > > Why does it unescape the URL before processing Rewrites, but not the rest > of the time? it does all the time > > IOW, why does > > some_url.html%3Fsource%3Dabc > > generate a 404? > because the file 'some_url.html?source=abc' doesn't exist on your filesystem > It's maddening. I suppose I should turn this in as a bug. > >> >> RedirectMatch temp .*\?source=.* some_other_page.html >> >> should work. > > That's the problem. That is a valid request that I don't want molested. > You misunderstand, RedirectMatch is only working on the URI, the query string is NOT included, the above will not molest what you consider a valid request. |
|
|||
|
phantom wrote:
> "George Sexton" <gsexton@mhsoftware.com> wrote in message > news:lqKdnUeMAKV9_NzVnZ2dnUVZ_uadnZ2d@comcast.com. .. >> Why does it unescape the URL before processing Rewrites, but not the rest >> of the time? > > it does all the time > >> IOW, why does >> >> some_url.html%3Fsource%3Dabc >> >> generate a 404? >> > > because the file > 'some_url.html?source=abc' > doesn't exist on your filesystem Well, the file some_url.html does exist on my system. That's the problem. Requests for some_url.html?source=abc work, while requests for: some_url.html%3Fsource%3dabc generate 404. Here's proof: http://www.mhsoftware.com/index.html%3Fsource%3Dabc http://www.mhsoftware.com/index.html?source=abc Evidently, URL decoding only gets done for stuff after a question mark, but a URL Encoded question mark is not a trigger for URL decoding. The inconsistency is that mod_rewrite IS URL decoding the %3F and %3D, while the normal request/service path does not. > >> It's maddening. I suppose I should turn this in as a bug. >> >>> RedirectMatch temp .*\?source=.* some_other_page.html >>> >>> should work. >> That's the problem. That is a valid request that I don't want molested. >> > > You misunderstand, RedirectMatch is only working on the URI, the query > string is NOT included, the above will not molest what you consider a valid > request. Ahh, there's what I really didn't understand. I'm the first to admit I suck at regular expressions, so any time I'm using mod_rewrite it's just a total exercise in frustration for me. Now that I understand this, it solves the problem for me. I just use RewriteMatch .*source.abc.* some_other_url.html > > |
|
|||
|
"George Sexton" <gsexton@mhsoftware.com> wrote in message
news:DrGdnX-3A7HjG9zVnZ2dnUVZWhednZ2d@comcast.com... > phantom wrote: >> "George Sexton" <gsexton@mhsoftware.com> wrote in message >> news:lqKdnUeMAKV9_NzVnZ2dnUVZ_uadnZ2d@comcast.com. .. >>> Why does it unescape the URL before processing Rewrites, but not the >>> rest of the time? >> >> it does all the time >> >>> IOW, why does >>> >>> some_url.html%3Fsource%3Dabc >>> >>> generate a 404? >>> >> >> because the file >> 'some_url.html?source=abc' >> doesn't exist on your filesystem > > Well, the file some_url.html does exist on my system. > > That's the problem. Requests for > > some_url.html?source=abc > > work, while requests for: > > some_url.html%3Fsource%3dabc > > generate 404. Here's proof: > You have missed the point, the two requests are NOT the same: some_url.html?source=abc is looking for the file 'some_url.html' and will pass the query string source=abc to it some_url.html%3Fsource%3dabc is looking for the file 'some_url.html?source=abc' and not passing any query string. |
|
|||
|
"George Sexton" <gsexton@mhsoftware.com> schreef in bericht
news:DrGdnX-3A7HjG9zVnZ2dnUVZWhednZ2d@comcast.com... > That's the problem. Requests for > some_url.html?source=abc > work, while requests for: > some_url.html%3Fsource%3dabc > generate 404. Here's proof: > http://www.mhsoftware.com/index.html%3Fsource%3Dabc > http://www.mhsoftware.com/index.html?source=abc Normally one should not use %-encoded characters within e.g. a <a href=...>-tag. Encoding and decoding is done automagically *once* by browser and server. A link with pre-encoded characters will be transferred with its % encoded http://www.mhsoftware.com/index.html%253Fsource%253Dabc then decoding will only decode the %25, leaving your server with an encoded question mark and alike equates. Now a days browsers may try to be forgiving on this issue too, however chances are the repaired results vary from broken towoard useless. HansH |