This is a discussion on RewriteEngine on content negotiated content within the Apache Web Server forums, part of the Web Server and Related Forums category; Hi! How could I specify rewrite rules which would be processed after Apache 2 choose file on content negotiation base? ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
Hi!
How could I specify rewrite rules which would be processed after Apache 2 choose file on content negotiation base? I found that directory module (mod_dir) does run index.html through rewrite rules when user request only directory and there is index.html in directory (user requests /path/ and Apache run /path/index.html through rewrite rules). Why mod_negotiation does not do this? I would like to make Apache redirect user to directory in dependance to language. So that if user prefer English it would be redirected from /path/something.html to /en/path/something.html. And that could be accomplished with /path/something.html content negotiation. When user requests /path/something.html Apache finds best match, for example /path/something.html.en and than rewrite rule redirect user to /en/path/something.html. Mike |
|
|||
|
"Mike Mimic" <ppagee@yahoo.com> schreef in bericht
news:cmjbs8$mqh$1@planja.arnes.si... > How could I specify rewrite rules which would be processed > after Apache 2 choose file on content negotiation base? > > I found that directory module (mod_dir) does run index.html > through rewrite rules when user request only directory and > there is index.html in directory (user requests /path/ and > Apache run /path/index.html through rewrite rules). Though mod_dir does not depend on mod_rewrite to do its tric... > Why mod_negotiation does not do this? .... and neither does mod_negotiate. In short the three do similar things differently and independantly _and_ at different stages of processing a request. > I would like to make Apache redirect user to directory > in dependance to language. So that if user prefer English > it would be redirected from /path/something.html to > /en/path/something.html. And that could be accomplished > with /path/something.html content negotiation. When user > requests /path/something.html Apache finds best match, > for example /path/something.html.en and than rewrite > rule redirect user to /en/path/something.html. WHY at all redirect a client to another location if you can handle the matter transparantly, that is without making the visitor aware???? Give it a shot the other way around, using ... rewriterule /(.*)\/(.*)\.(.*) /$2.$1.$3 [PT,NC,QSA] .... this should allow -I think but did not test- mod_negotiate to serve a document in another language than requested, following the Accept-language header and its quantifiers. By not sending redirects to the browser and using releative links in your href-s, that browser will continue to request following pages from the initial language branch. So any page will be _transparantly_ served in the 'technically' best language available. HansH |
|
|||
|
Hi!
Mike Mimic wrote: > I would like to make Apache redirect user to directory > in dependance to language. So that if user prefer English > it would be redirected from /path/something.html to > /en/path/something.html. And that could be accomplished > with /path/something.html content negotiation. When user > requests /path/something.html Apache finds best match, > for example /path/something.html.en and than rewrite > rule redirect user to /en/path/something.html. I made this: RewriteCond %{LA-U:REQUEST_URI} !^/(en|fr|it)/ RewriteCond %{LA-U:REQUEST_URI} \.html\.(en|fr|it)$ RewriteRule ^/(.*) /%1/$1 [R,L] And it works. Mike |
|
|||
|
"Mike Mimic" <ppagee@yahoo.com> schreef in bericht
news:cmjo31$3p0$1@planja.arnes.si... > > I would like to make Apache redirect user to directory > > in dependance to language. So that if user prefer English > > it would be redirected from /path/something.html to > > /en/path/something.html. And that could be accomplished > > with /path/something.html content negotiation. When user > > requests /path/something.html Apache finds best match, > > for example /path/something.html.en and than rewrite > > rule redirect user to /en/path/something.html. > I made this: > RewriteCond %{LA-U:REQUEST_URI} !^/(en|fr|it)/ > RewriteCond %{LA-U:REQUEST_URI} \.html\.(en|fr|it)$ > RewriteRule ^/(.*) /%1/$1 [R,L] Nice ... looking ahead for the result of mod_negotiate. However, that's a way to keep your server busy too ... .... doing two LookAhead requests and a redirect. As mod_negotiate will not change the beginning of the URI, why not just use RewriteCond %{REQUEST_URI} !^/(en|fr|it)/ or do some crafting on the rule's regex -untested- RewriteRule ^[^(/en|/fr|/it)]/(.*\.html) /%1/$1 [L,R] ( Added \.html to the rule to keep it from processing images. Be aware the rule is defined last, but executed first! ) What would be the response to a client setup for German or Dutch _only_? Then I started wondering ... .... mod_negotiate needs the versions side by side .... the redirect results in requests per language branch seems conflicting to me. How about converting a language branch link to a mod_negotiate file structure by a single rule RewriteRule ^(/en|/fr|/it)/(.*\.html) /$2.$1 [PT,L] HansH -- Cannot decide what to hate most LookAheads or Redirects ... |
|
|||
|
Hi!
HansH wrote: > Nice ... looking ahead for the result of mod_negotiate. > However, that's a way to keep your server busy too ... > ... doing two LookAhead requests and a redirect. Apache does not cache it? > As mod_negotiate will not change the beginning of the URI, why not just use > RewriteCond %{REQUEST_URI} !^/(en|fr|it)/ Good idea. :-) > or do some crafting on the rule's regex -untested- > RewriteRule ^[^(/en|/fr|/it)]/(.*\.html) /%1/$1 [L,R] I doubt this will work because [] is character class so you said "match any character except (, /, e, n, |, f, r, i, t). > ( Added \.html to the rule to keep it from processing images. I want it to process images (even images can be multilangual). > Be aware the rule is defined last, but executed first! ) What do you mean with this? Executed first? > What would be the response to a client setup for German or Dutch _only_? I have "LanguagePriority" and "ForceLanguagePriority Prefer Fallback" for those. > Then I started wondering ... > ... mod_negotiate needs the versions side by side > ... the redirect results in requests per language branch > seems conflicting to me. > > How about converting a language branch link to a mod_negotiate file > structure by a single rule > RewriteRule ^(/en|/fr|/it)/(.*\.html) /$2.$1 [PT,L] Yes, I have this. All rules are: # This redirect users to "language" subdirs so that the structure # is consistent and relative URLs work RewriteCond %{REQUEST_URI} !^/(en|it|sl)/ RewriteCond %{LA-U:REQUEST_URI} \.html\.(en|it|sl)$ RewriteRule ^/(.*) /%1/$1 [R,L] # This corrects missing / RewriteRule ^/(en|it|sl)$ /$1/ [R,L] # This than redirect Apache to existing file RewriteRule ^/(en|it|sl)/(.*)$ /$2 [PT,E=prefer-language:$1] I use prefer-language rather than appending language to URL because there is maybe missing that language version for that file. In this case Apache chooses next prefered version. Mike |
|
|||
|
"Mike Mimic" <ppagee@yahoo.com> schreef in bericht
news:cmlso1$23$1@planja.arnes.si... > > Nice ... looking ahead for the result of mod_negotiate. > > However, that's a way to keep your server busy too ... > > ... doing two LookAhead requests and a redirect. > Apache does not cache it? AFAIK it indeed does NOT ;-) > > As mod_negotiate will not change the beginning of the URI, why not just use > > RewriteCond %{REQUEST_URI} !^/(en|fr|it)/ > Good idea. :-) > > or do some crafting on the rule's regex -untested- > > RewriteRule ^[^(/en|/fr|/it)]/(.*\.html) /%1/$1 [L,R] > I doubt this will work because [] is character class so you said "match > any character except (, /, e, n, |, f, r, i, t). However, the three characters ( | ) has special meanings too. Alternating the three groups of characters _and_ delimiting the scope of alternation by round brackets do make this section act as a single character in a negated class in Perl 5.8.3 ... Might not work for Apache1 series, but 2.0.47 or above should have better chances :-) Oops, $1 at the far right end must obviously be a $2. > > ( Added \.html to the rule to keep it from processing images. > I want it to process images (even images can be multilangual). Point taken. Try to limit multilingual images to a single type and go for (html|that_type), reducing rewrites still is a good thing. > > Be aware the rule is defined last, but executed first! ) > What do you mean with this? Executed first? The order of configuration and execution of rules and its conditions: http://httpd.apache.org/docs-2.0/mod/mod_rewrite.html " The order of rules in the ruleset is important because the rewriting engine processes them in a special (and not very obvious) order. The rule is this: The rewriting engine loops through the ruleset rule by rule (RewriteRule directives) and when a particular rule matches it optionally loops through existing corresponding conditions (RewriteCond directives). For historical reasons the conditions are given first, and so the control flow is a little bit long-winded. " > Yes, I have this. All rules are: > # This redirect users to "language" subdirs so that the structure > # is consistent and relative URLs work > RewriteCond %{REQUEST_URI} !^/(en|it|sl)/ > RewriteCond %{LA-U:REQUEST_URI} \.html\.(en|it|sl)$ > RewriteRule ^/(.*) /%1/$1 [R,L] > # This corrects missing / > RewriteRule ^/(en|it|sl)$ /$1/ [R,L] Does not mod_dir take care of this? > # This than redirect Apache to existing file > RewriteRule ^/(en|it|sl)/(.*)$ /$2 [PT,E=prefer-language:$1] You must have eaten _and_ digested the documentation ;-) BTW, what happend to the french language ... > I use prefer-language rather than appending language to URL because > there is maybe missing that language version for that file. In this > case Apache chooses next prefered version. Oops, I've misinterpreted the tabel on filenames and links. Still, I hate LookAheads and Redirects so let's assume a request for /folder/file.html comes in: The first rule and its conditons return a redirect to /en/folder/file.html. A new request is then received for /folder/file.html.en, to be rewritten internally by the last rule to /folder/file.html, setting a preferance to mod_negotiate for english and the proper page is served. IMH -and biassed- O the first rewrite does not add anything good to the total process. All it does is adding load to the server and an extra roundtrip -ping- delay to the client -why at all telling the client about the should-be-internal detour-??. HansH |
|
|||
|
HansH wrote:
> "Mike Mimic" <ppagee@yahoo.com> schreef in bericht >>Apache does not cache it? > > AFAIK it indeed does NOT ;-) Than it should. :-) Maybe something like would do the trick: RewriteCond %{LA-U:REQUEST_URI} (.*) RewriteRule . - [E=temp_uri:%1] RewriteCond %{temp_uri} !^/(en|fr|it)/ RewriteCond %{temp_uri} \.html\.(en|fr|it)$ RewriteRule ^/(.*) /%1/$1 [R,L] >>>or do some crafting on the rule's regex -untested- >>> RewriteRule ^[^(/en|/fr|/it)]/(.*\.html) /%1/$1 [L,R] >> >>I doubt this will work because [] is character class so you said "match >>any character except (, /, e, n, |, f, r, i, t). > > However, the three characters ( | ) has special meanings too. Alternating > the three groups of characters _and_ delimiting the scope of alternation by > round brackets do make this section act as a single character in a negated > class in Perl 5.8.3 ... Might not work for Apache1 series, but 2.0.47 or > above should have better chances :-) Not by my perl book (Programming Perl 3ed, chapter 5.4.1. Custom Character Classes): It's also meaningless to specify quantifiers, assertions, or alternation inside a character class, since the characters are interpreted individually. For example, [fee|fie|foe|foo] means the same thing as [feio|]. > Point taken. Try to limit multilingual images to a single type and go for > (html|that_type), reducing rewrites still is a good thing. Indeed. > The order of configuration and execution of rules and its conditions: > http://httpd.apache.org/docs-2.0/mod/mod_rewrite.html > " The order of rules in the ruleset is important because the rewriting > engine processes them in a special (and not very obvious) order. The rule is > this: The rewriting engine loops through the ruleset rule by rule > (RewriteRule directives) and when a particular rule matches it optionally > loops through existing corresponding conditions (RewriteCond directives). > For historical reasons the conditions are given first, and so the control > flow is a little bit long-winded. " I skipped this part. Interesting. :-) >># This corrects missing / >>RewriteRule ^/(en|it|sl)$ /$1/ [R,L] > > Does not mod_dir take care of this? No, because there are no "real" language subdirs. >># This than redirect Apache to existing file >>RewriteRule ^/(en|it|sl)/(.*)$ /$2 [PT,E=prefer-language:$1] > > You must have eaten _and_ digested the documentation ;-) I will take this as a compliment. > BTW, what happend to the french language ... I am playing with different languages. I have not yet decided on them. > Still, I hate LookAheads and Redirects so let's assume a request for > /folder/file.html comes in: > The first rule and its conditons return a redirect to /en/folder/file.html. > A new request is then received for /folder/file.html.en, to be rewritten > internally by the last rule to /folder/file.html, setting a preferance to > mod_negotiate for english and the proper page is served. > > IMH -and biassed- O the first rewrite does not add anything good to the > total process. All it does is adding load to the server and an extra > roundtrip -ping- delay to the client -why at all telling the client about > the should-be-internal detour-??. Yes. It is not so obvious. I think I should explain the whole idea. The idea is that when user opens page it sends him language based on browser settings. The problem is that those settings maybe wrong (maybe user is a guest on somebody else's computer with different language or something). So the user should be able to overwrite those settings. And that it can do with those "directory suffixes". So when user comes to /en/... it will get English version no matter what (that is, if there is an English version of file). That is accomplished with the last rule. So the question is why there is the first rule and that look-ahead. The point is that without that first-request redirection there would be two types of URLs. One with "directory suffix" (for those which changed language) and one without. But that URL without suffix does not identify content uniquely (it depends on language settings in browser). And I would like that URLs represent uniquely content of the page. I think that this is the point of it. Than comes the question why should I use content negotiation if I do not like that content of URLs is not unique. I could redirect first request to subdirectory based on language and than it would be everything simple. But this meany that I would need to synchronize many directory structures which are basically the same - there would be only difference in root directory. And with many languages this is nightmare. So I would like to have all language variants file in the same directory and choose between them based on that "directory suffix". And this is it. (And it is even better because it works even if some language variant is missing.) Mike |
|
|||
|
"Mike Mimic" <ppagee@yahoo.com> schreef in bericht
news:cmm6e3$7rf$1@planja.arnes.si... > HansH wrote: > > "Mike Mimic" <ppagee@yahoo.com> schreef in bericht > Maybe something like would do the trick: > RewriteCond %{LA-U:REQUEST_URI} (.*) > RewriteRule . - [E=temp_uri:%1] > RewriteCond %{temp_uri} !^/(en|fr|it)/ > RewriteCond %{temp_uri} \.html\.(en|fr|it)$ > RewriteRule ^/(.*) /%1/$1 [R,L] OK! Now apply the pattern of the second rule to the first AND chain the two together: if the first mismatches the second is not even tried... (For this, however, the pattern above is a bad sample) > >>>or do some crafting on the rule's regex -untested- > >>> RewriteRule ^[^(/en|/fr|/it)]/(.*\.html) /%1/$1 [L,R] > Not by my perl book (Programming Perl 3ed, chapter 5.4.1. Custom > Character Classes): > It's also meaningless to specify quantifiers, assertions, or alternation > inside a character class, since the characters are interpreted > individually. For example, [fee|fie|foe|foo] means the same thing as > [feio|]. Still that exclusion does not include [^(fee|fie|foe|foo)] ;-) It all depends whether the implementation is handling () inside [] or not.... [abc(def)] might as wel return d,e or f in $1 ... At least Postgress-7 regex handles this by your book ;-) > >># This corrects missing / > >>RewriteRule ^/(en|it|sl)$ /$1/ [R,L] > > Does not mod_dir take care of this? > No, because there are no "real" language subdirs. Roger! > Yes. It is not so obvious. > I think I should explain the whole idea. The idea is that when user > opens page it sends him language based on browser settings. The problem > is that those settings maybe wrong (maybe user is a guest on somebody > else's computer with different language or something). So the user > should be able to overwrite those settings. And that it can do with > those "directory suffixes". So when user comes to /en/... it will > get English version no matter what (that is, if there is an English > version of file). That is accomplished with the last rule. > > So the question is why there is the first rule and that look-ahead. The > point is that without that first-request redirection there would be > two types of URLs. One with "directory suffix" (for those which changed > language) and one without. But that URL without suffix does not identify > content uniquely (it depends on language settings in browser). So, the looking-ahead rewrite is only invoked on the initial request for www.foo.bar ... probably changed to www.foo.bar/index.html and from there to www.foo.bar/xx/index.html to finally serve /index.html.xx. That largely takes away the burden of a potential server (over)load and extended client delays. > And I would like that URLs represent uniquely content of the page. I > think that this is the point of it. Although the vary: response-header should make caches aware of the issue, beter prevent than cure. > Than comes the question why should I use content negotiation if I do > not like that content of URLs is not unique. I could redirect first > request to subdirectory based on language and than it would be > everything simple. But this meany that I would need to synchronize > many directory structures which are basically the same - there would be > only difference in root directory. And with many languages this is > nightmare. So I would like to have all language variants file in the > same directory and choose between them based on that "directory suffix". The preference to maintain language versions side-by-side rather than branch-by-branch is rather obvious. > And this is it. (And it is even better because it works even if some > language variant is missing.) I think, I'll file this conversation for future reference. Good luck translating your pages ... HansH |
|
|||
|
Hi!
HansH wrote: >>>>>or do some crafting on the rule's regex -untested- >>>>> RewriteRule ^[^(/en|/fr|/it)]/(.*\.html) /%1/$1 [L,R] >> >>Not by my perl book (Programming Perl 3ed, chapter 5.4.1. Custom >>Character Classes): >>It's also meaningless to specify quantifiers, assertions, or alternation >>inside a character class, since the characters are interpreted >>individually. For example, [fee|fie|foe|foo] means the same thing as >>[feio|]. > > Still that exclusion does not include [^(fee|fie|foe|foo)] ;-) > It all depends whether the implementation is handling () inside [] or > not.... [abc(def)] might as wel return d,e or f in $1 ... > At least Postgress-7 regex handles this by your book ;-) I think that if it does not work in Perl it will not work in Apache (without some external programs or code changing) either as Apache user Perl API for regex. > So, the looking-ahead rewrite is only invoked on the initial request for > www.foo.bar ... probably changed to www.foo.bar/index.html and from there to > www.foo.bar/xx/index.html to finally serve /index.html.xx. That largely > takes away the burden of a potential server (over)load and extended client > delays. Yes, only if there is no "directory suffix" and that happens only at initial request (or maybe even then not). So the rules are: # This redirect users to "language" subdirs RewriteCond %{REQUEST_URI} !^/(en|fr|it)/ RewriteCond %{LA-U:REQUEST_URI} \.(en|fr|it)$ RewriteRule ^/(.*) /%1/$1 [R,L] # This corrects missing / RewriteRule ^/(en|fr|it)$ /$1/ [R,L] # This than redirect Apache to existing file RewriteRule ^/(en|fr|it)/(.*)$ /$2 [PT,E=prefer-language:$1] First rule must match everything (and cannot be optimized just for html files (and other possibly multilingual files as pdf, ps and doc files)) because otherwise redirect does not work for requests for directories as rule is not matched for initial request and so subrequests (where it comes to index.html) are not even tryed. But this is not really a problem as I would like that user is redirected as soon as possible and that can be because of many reasons (there can be many types of multilingual files) and so it is redirected as soon as user comes at multilingual file (that is also the reason why I removed "html" from second condition). And there are many multilingual files, aren't they (why would otherwise someone use this rules)? And "match everything" is quite fast regular expression. > I think, I'll file this conversation for future reference. Again a compliment. > Good luck translating your pages ... Others will do this for me. I will just copy-paste text. But I have to prepare space to where I can copy it. :-) Mike |
|
|||
|
"Mike Mimic" <ppagee@yahoo.com> schreef in bericht
news:cmp1st$n46$1@planja.arnes.si... > So the rules are: > > # This redirect users to "language" subdirs > RewriteCond %{REQUEST_URI} !^/(en|fr|it)/ > RewriteCond %{LA-U:REQUEST_URI} \.(en|fr|it)$ > RewriteRule ^/(.*) /%1/$1 [R,L] > > # This corrects missing / > RewriteRule ^/(en|fr|it)$ /$1/ [R,L] > > # This than redirect Apache to existing file > RewriteRule ^/(en|fr|it)/(.*)$ /$2 [PT,E=prefer-language:$1] It finally hit me: You are trying the _same_thing_ Apache Software Foundation is doing at http://httpd.apache.org/docs-2.0/ And this is their way of doing it: AliasMatch ^/manual(?:/(?:de|en|es|fr|ja|ko|ru))?(/.*)?$ "/usr/share/doc/manual/$1" <Directory "/usr/share/doc/manual/"> <Files *.html> SetHandler type-map </Files> SetEnvIf Request_URI ^/manual/(de|en|es|fr|ja|ko|ru)/ prefer-language=$1 RedirectMatch 301 ^/manual(?:/(de|en|es|fr|ja|ko|ru)){2,}(/.*)?$ /manual/$1$2 </Directory> For each set of files a mapper file is included, containing a list like: URI: <base name>.html.fr Content-Language: fr Content-type: text/html; charset=ISO-8859-1 URI: <base name>.html.ja.euc-jp Content-Language: ja Content-type: text/html; charset=EUC-JP <EOF> Do notice the charsets varies per language. If AddCharset do not work well enough for you, you may need this for a language like 'sl' too ;-) This type-map handler even takes care of negotiating the initial request !! Apparently, the first variant listed is the default in case the browser's accepted languages is un-available. Hm... looks like the languages are alphabatically ordered ... giving me a mix of German and English pages to a browser preferring [only] Dutch. I assume the type-map handler initiates this header: Vary: negotiate,accept-language,accept-charset,Accept-Encoding IMHO this header is pointless, but for requests for /manual. Quite a twist at the end of a long thread ... HansH |