mod_rewrite and Percent Signs

This is a discussion on mod_rewrite and Percent Signs within the Apache Web Server forums, part of the Web Server and Related Forums category; I'm seeing hits in my logs: /some_url.htm%3Fsource%3Dthe_source which if it were in normal format would be: /...


Go Back   Usenet Forums > Web Server and Related Forums > Apache Web Server

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 05-30-2008
George Sexton
 
Posts: n/a
Default mod_rewrite and Percent Signs

I'm seeing hits in my logs:

/some_url.htm%3Fsource%3Dthe_source

which if it were in normal format would be:

/some_url.htm?source=the_source

I really think these are harvesters looking for addresses. Anyhow, I'd
like to re-write the URL's so that they redirect to a page. I've tried
just about every variation I could possibly think of on:

RewriteMatch temp .*%3Fsource.3D.*$ some_other_page.html

I've also tried:

RewriteMatch temp 3Fsource some_other_page.html

and it doesn't work.

The last one will work for:

/some3FsourceOtherPage.html

but not

/somepage.html%3Fsource%3dxxxxx


I'm guessing there's some magic about the % sign. Can anyone give me an
idea on how to handle this?
  #2 (permalink)  
Old 05-30-2008
George Sexton
 
Posts: n/a
Default Re: mod_rewrite and Percent Signs

George Sexton wrote:
> I'm seeing hits in my logs:
>
> /some_url.htm%3Fsource%3Dthe_source
>
> which if it were in normal format would be:
>
> /some_url.htm?source=the_source
>
> I really think these are harvesters looking for addresses. Anyhow, I'd
> like to re-write the URL's so that they redirect to a page. I've tried
> just about every variation I could possibly think of on:
>
> RewriteMatch temp .*%3Fsource.3D.*$ some_other_page.html


Dumb. I meant RedirectMatch

>
> I've also tried:
>
> RewriteMatch temp 3Fsource some_other_page.html
>
> and it doesn't work.
>
> The last one will work for:
>
> /some3FsourceOtherPage.html
>
> but not
>
> /somepage.html%3Fsource%3dxxxxx
>
>
> I'm guessing there's some magic about the % sign. Can anyone give me an
> idea on how to handle this?

  #3 (permalink)  
Old 05-30-2008
phantom
 
Posts: n/a
Default Re: mod_rewrite and Percent Signs

"George Sexton" <gsexton@mhsoftware.com> wrote in message
news:ddGdnZPkPs72xt3VnZ2dnUVZ_s7inZ2d@comcast.com. ..
> I'm seeing hits in my logs:
>
> /some_url.htm%3Fsource%3Dthe_source
>
> which if it were in normal format would be:
>
> /some_url.htm?source=the_source
>
> I really think these are harvesters looking for addresses. Anyhow, I'd
> like to re-write the URL's so that they redirect to a page. I've tried
> just about every variation I could possibly think of on:
>
> RewriteMatch temp .*%3Fsource.3D.*$ some_other_page.html
>
> I've also tried:
>
> RewriteMatch temp 3Fsource some_other_page.html
>
> and it doesn't work.
>
> The last one will work for:
>
> /some3FsourceOtherPage.html
>
> but not
>
> /somepage.html%3Fsource%3dxxxxx
>
>
> I'm guessing there's some magic about the % sign. Can anyone give me an
> idea on how to handle this?


http://httpd.apache.org/docs/2.2/mod...e.html#quoting


  #4 (permalink)  
Old 05-31-2008
George Sexton
 
Posts: n/a
Default Re: mod_rewrite and Percent Signs

phantom wrote:
> "George Sexton" <gsexton@mhsoftware.com> wrote in message
> news:ddGdnZPkPs72xt3VnZ2dnUVZ_s7inZ2d@comcast.com. ..


>>
>> I'm guessing there's some magic about the % sign. Can anyone give me an
>> idea on how to handle this?

>
> http://httpd.apache.org/docs/2.2/mod...e.html#quoting
>
>


First thing I tried. Doesn't work.
  #5 (permalink)  
Old 05-31-2008
phantom
 
Posts: n/a
Default Re: mod_rewrite and Percent Signs

"George Sexton" <gsexton@mhsoftware.com> wrote in message
news:oNednQfh5KDsEN3VnZ2dnUVZ_rLinZ2d@comcast.com. ..
> phantom wrote:
>> "George Sexton" <gsexton@mhsoftware.com> wrote in message
>> news:ddGdnZPkPs72xt3VnZ2dnUVZ_s7inZ2d@comcast.com. ..

>
>>>
>>> I'm guessing there's some magic about the % sign. Can anyone give me an
>>> idea on how to handle this?

>>
>> http://httpd.apache.org/docs/2.2/mod...e.html#quoting
>>
>>

>
> First thing I tried. Doesn't work.


oops, apache unescapes the url before it processes the rewrites:

RedirectMatch temp .*\?source=.* some_other_page.html

should work.


  #6 (permalink)  
Old 05-31-2008
George Sexton
 
Posts: n/a
Default Re: mod_rewrite and Percent Signs

phantom wrote:
> "George Sexton" <gsexton@mhsoftware.com> wrote in message
> news:oNednQfh5KDsEN3VnZ2dnUVZ_rLinZ2d@comcast.com. ..
>> phantom wrote:
>>> "George Sexton" <gsexton@mhsoftware.com> wrote in message
>>> news:ddGdnZPkPs72xt3VnZ2dnUVZ_s7inZ2d@comcast.com. ..
>>>> I'm guessing there's some magic about the % sign. Can anyone give me an
>>>> idea on how to handle this?
>>> http://httpd.apache.org/docs/2.2/mod...e.html#quoting
>>>
>>>

>> First thing I tried. Doesn't work.

>
> oops, apache unescapes the url before it processes the rewrites:


Why does it unescape the URL before processing Rewrites, but not the
rest of the time?

IOW, why does

some_url.html%3Fsource%3Dabc

generate a 404?

It's maddening. I suppose I should turn this in as a bug.

>
> RedirectMatch temp .*\?source=.* some_other_page.html
>
> should work.


That's the problem. That is a valid request that I don't want molested.


>
>

  #7 (permalink)  
Old 05-31-2008
phantom
 
Posts: n/a
Default Re: mod_rewrite and Percent Signs

"George Sexton" <gsexton@mhsoftware.com> wrote in message
news:lqKdnUeMAKV9_NzVnZ2dnUVZ_uadnZ2d@comcast.com. ..
>
> Why does it unescape the URL before processing Rewrites, but not the rest
> of the time?


it does all the time

>
> IOW, why does
>
> some_url.html%3Fsource%3Dabc
>
> generate a 404?
>


because the file
'some_url.html?source=abc'
doesn't exist on your filesystem

> It's maddening. I suppose I should turn this in as a bug.
>
>>
>> RedirectMatch temp .*\?source=.* some_other_page.html
>>
>> should work.

>
> That's the problem. That is a valid request that I don't want molested.
>


You misunderstand, RedirectMatch is only working on the URI, the query
string is NOT included, the above will not molest what you consider a valid
request.


  #8 (permalink)  
Old 05-31-2008
George Sexton
 
Posts: n/a
Default Re: mod_rewrite and Percent Signs

phantom wrote:
> "George Sexton" <gsexton@mhsoftware.com> wrote in message
> news:lqKdnUeMAKV9_NzVnZ2dnUVZ_uadnZ2d@comcast.com. ..
>> Why does it unescape the URL before processing Rewrites, but not the rest
>> of the time?

>
> it does all the time
>
>> IOW, why does
>>
>> some_url.html%3Fsource%3Dabc
>>
>> generate a 404?
>>

>
> because the file
> 'some_url.html?source=abc'
> doesn't exist on your filesystem


Well, the file some_url.html does exist on my system.

That's the problem. Requests for

some_url.html?source=abc

work, while requests for:

some_url.html%3Fsource%3dabc

generate 404. Here's proof:

http://www.mhsoftware.com/index.html%3Fsource%3Dabc

http://www.mhsoftware.com/index.html?source=abc

Evidently, URL decoding only gets done for stuff after a question mark,
but a URL Encoded question mark is not a trigger for URL decoding.

The inconsistency is that mod_rewrite IS URL decoding the %3F and %3D,
while the normal request/service path does not.


>
>> It's maddening. I suppose I should turn this in as a bug.
>>
>>> RedirectMatch temp .*\?source=.* some_other_page.html
>>>
>>> should work.

>> That's the problem. That is a valid request that I don't want molested.
>>

>
> You misunderstand, RedirectMatch is only working on the URI, the query
> string is NOT included, the above will not molest what you consider a valid
> request.


Ahh, there's what I really didn't understand. I'm the first to admit I
suck at regular expressions, so any time I'm using mod_rewrite it's just
a total exercise in frustration for me.

Now that I understand this, it solves the problem for me. I just use

RewriteMatch .*source.abc.* some_other_url.html



>
>

  #9 (permalink)  
Old 05-31-2008
phantom
 
Posts: n/a
Default Re: mod_rewrite and Percent Signs

"George Sexton" <gsexton@mhsoftware.com> wrote in message
news:DrGdnX-3A7HjG9zVnZ2dnUVZWhednZ2d@comcast.com...
> phantom wrote:
>> "George Sexton" <gsexton@mhsoftware.com> wrote in message
>> news:lqKdnUeMAKV9_NzVnZ2dnUVZ_uadnZ2d@comcast.com. ..
>>> Why does it unescape the URL before processing Rewrites, but not the
>>> rest of the time?

>>
>> it does all the time
>>
>>> IOW, why does
>>>
>>> some_url.html%3Fsource%3Dabc
>>>
>>> generate a 404?
>>>

>>
>> because the file
>> 'some_url.html?source=abc'
>> doesn't exist on your filesystem

>
> Well, the file some_url.html does exist on my system.
>
> That's the problem. Requests for
>
> some_url.html?source=abc
>
> work, while requests for:
>
> some_url.html%3Fsource%3dabc
>
> generate 404. Here's proof:
>


You have missed the point, the two requests are NOT the same:

some_url.html?source=abc
is looking for the file 'some_url.html' and will pass the query string
source=abc to it

some_url.html%3Fsource%3dabc
is looking for the file 'some_url.html?source=abc' and not passing any query
string.


  #10 (permalink)  
Old 05-31-2008
HansH
 
Posts: n/a
Default Re: mod_rewrite and Percent Signs

"George Sexton" <gsexton@mhsoftware.com> schreef in bericht
news:DrGdnX-3A7HjG9zVnZ2dnUVZWhednZ2d@comcast.com...
> That's the problem. Requests for
> some_url.html?source=abc
> work, while requests for:
> some_url.html%3Fsource%3dabc
> generate 404. Here's proof:
> http://www.mhsoftware.com/index.html%3Fsource%3Dabc
> http://www.mhsoftware.com/index.html?source=abc


Normally one should not use %-encoded characters within e.g. a <a
href=...>-tag. Encoding and decoding is done automagically *once* by browser
and server.

A link with pre-encoded characters will be transferred with its % encoded
http://www.mhsoftware.com/index.html%253Fsource%253Dabc
then decoding will only decode the %25, leaving your server with an encoded
question mark and alike equates.

Now a days browsers may try to be forgiving on this issue too, however
chances are the repaired results vary from broken towoard useless.

HansH



 
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT +1. The time now is 10:18 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.0.0