This is a discussion on Problem with RewriteRule when url contains percent character within the Linux Web Servers forums, part of the Web Server and Related Forums category; Hi, I'm having problems with a RewriteRule that's applied to url's with the % character in them, hope ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
Hi,
I'm having problems with a RewriteRule that's applied to url's with the % character in them, hope someone can help. The % character is a result of url-encoding non-ASCII words, as in the example below: 1. the word "sécurité" comes out of my db 2. I construct the following link, using the php urlencode function: <a href="/search/s%C3%A9curit%C3%A9">sécurité</a> 3. the url created should be interpreted by a RewriteRule: RewriteRule ^search/([a-zA-Z0-9-+%]+)$ /pages/search.php?word=$1 [QSA,L] However the RewriteRule doesn't match on my url, and I see this in the RewriteLog: init rewrite engine with requested uri /search/sécurité So it seems like some kind of decoding is going on so that the RewriteRule never even sees the % character. I have set everything I can think of (MySql SET NAMES, Apache AddDefaultCharset) to utf-8. Any ideas? TIA, JON |
|
|||
|
"Jon Maz" <pparker.removethis@gmx.removethistoo.net> schreef in bericht
news:eq5kcj$9q9$1@aioe.org... > I'm having problems with a RewriteRule that's applied to url's with the % > character in them, hope someone can help. The % character is a result of > url-encoding non-ASCII words, as in the example below: > > 1. the word "sécurité" comes out of my db > > 2. I construct the following link, using the php urlencode function: > <a href="/search/s%C3%A9curit%C3%A9">sécurité</a> > > 3. the url created should be interpreted by a RewriteRule: > RewriteRule ^search/([a-zA-Z0-9-+%]+)$ /pages/search.php?word=$1 > [QSA,L] > > However the RewriteRule doesn't match on my url, and I see this in the > RewriteLog: > > init rewrite engine with requested uri /search/sécurité > > So it seems like some kind of decoding is going on so that the RewriteRule > never even sees the % character. I have set everything I can think of > (MySql SET NAMES, Apache AddDefaultCharset) to utf-8. > So php has encoded the url to some ISO8859 variant and apache is decoding those to some utf ... next to wonder is the charset used by your OS to store the file name ... In general, just forget diacritial, language specific, fancy characters and just use 'securite' for filename. It keeps you from dozens of cross-platform and cross-language traps, easing migration of a website ten fold. http://czyborra.com/charsets/iso8859.html 'The ISO 8859 Alphabet Soup' HansH |
|
|||
|
Hi Hans,
Thanks for your answer. I guess I'm best off just avoiding the whole thing. What got me wondering was the fact that my php application can cope fine when this encoded word is passed in the query string: /pages/search.php?word=s%C3%A9curit%C3%A9 But perhaps it's simply that different rules apply to a url and a query string parameter? Thanks, JON |
|
|||
|
"Jon Maz" <pparker.removethis@gmx.removethistoo.net> wrote in message news:eq5kcj$9q9$1@aioe.org... > Hi, > > I'm having problems with a RewriteRule that's applied to url's with the % > character in them, hope someone can help. The % character is a result of > url-encoding non-ASCII words, as in the example below: > > 1. the word "sécurité" comes out of my db > > 2. I construct the following link, using the php urlencode function: > <a href="/search/s%C3%A9curit%C3%A9">sécurité</a> How do you get s%C3%A9curit%C3%A9 from sécurité sécurité, url encoded, is s%E9curit%E9 s%C3%A9curit%C3%A9 decoded is sécurité as is correctly reported in your rewrite log. > > 3. the url created should be interpreted by a RewriteRule: > RewriteRule ^search/([a-zA-Z0-9-+%]+)$ /pages/search.php?word=$1 [QSA,L] a hyphen in a character class specifies a range unless it's the first or last character in the class what range are you looking for with 9-+ > > However the RewriteRule doesn't match on my url, and I see this in the > RewriteLog: > > init rewrite engine with requested uri /search/sécurité The rewrite rule works correctly, the uri contains à and ©. The regex doesn't allow for these. > > So it seems like some kind of decoding is going on so that the RewriteRule > never even sees the % character. I have set everything I can think of > (MySql SET NAMES, Apache AddDefaultCharset) to utf-8. > The uri is decoded before the server tries to resolve it, why would it not? Why are you trying to do the heavy lifting with mod rewrite? just pass the search term to the script and validate it there, you should validate all user input in your scripts. RewriteRule ^search/(.+)$ /pages/search.php?word=$1 [QSA,L] Rich |
|
|||
|
On Sun, 4 Feb 2007 21:49:08 -0000
"Jon Maz" <pparker.removethis@gmx.removethistoo.net> wrote: > So it seems like some kind of decoding is going on so that the > RewriteRule never even sees the % character. I have set everything I > can think of (MySql SET NAMES, Apache AddDefaultCharset) to utf-8. No you haven't. The expression in your RewriteRule is firmly in ASCII, so it fails to match the non-ASCII characters in the URL. > Any ideas? Don't faff about with mod_rewrite like that. Or if you really must, fix your regexp. Or as someone else said, stick to ASCII. -- Nick Kew Application Development with Apache - the Apache Modules Book http://www.apachetutor.org/ |
|
|||
|
"rh" <disposable12345@cableone.net> wrote:
> >"Jon Maz" <pparker.removethis@gmx.removethistoo.net> wrote: >> >> I'm having problems with a RewriteRule that's applied to url's with the % >> character in them, hope someone can help. The % character is a result of >> url-encoding non-ASCII words, as in the example below: >> >> 1. the word "sécurité" comes out of my db >> >> 2. I construct the following link, using the php urlencode function: >> <a href="/search/s%C3%A9curit%C3%A9">sécurité</a> > >How do you get s%C3%A9curit%C3%A9 from sécurité > >sécurité, url encoded, is s%E9curit%E9 Only in iso-8859-1. In UTF-8, the OP's encoding is correct. -- Tim Roberts, timr@probo.com Providenza & Boekelheide, Inc. |