This is a discussion on google vs content negotiation (foo.html.it, foo.html.pt, etc) within the Linux Web Servers forums, part of the Web Server and Related Forums category; Hi, I have web pages for my site, dedasys.com in both English and Italian, using files like index.html....
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
Hi, I have web pages for my site, dedasys.com in both English and Italian, using files like index.html.it and index.html.en. Google doesn't seem to be caching the Italian version of the page though, after a month or two of it being up, so I suspect that the extra .it (it's linked directly from the English page that way) may be confusing it... Anyone else observed this, or know more about it? If this is actually what google does, it should probably be publicized (documented in apache's docs, for starters) so that people don't run into the same problem. Thanks for your time, -- David N. Welton Consulting: http://www.dedasys.com/ Personal: http://www.dedasys.com/davidw/ Free Software: http://www.dedasys.com/freesoftware/ Apache Tcl: http://tcl.apache.org/ |
|
|||
|
davidw@dedasys.com (David N. Welton) wrote: > Hi, I have web pages for my site, dedasys.com in both English and > Italian, using files like index.html.it and index.html.en. > Google doesn't seem to be caching the Italian version of the page > though, Idiots at work! The simpletons at Google index only documents with certain "extensions". For example, ".htm8" ".mac" ".win" ending documents are not indexed even if the type is text/html. http://www.google.com/groups?th=7f00a97b04fa2509 The Google cretins cannot understand the concept of MIME types. BTW: I suggest reading http://www.cs.tut.fi/~jkorpela/flags.html -- http://www.unics.uni-hannover.de/nhtcapri/plonk.txt |
|
|||
|
Andreas Prilop <nhtcapri@rrzn-user.uni-hannover.de> writes:
> davidw@dedasys.com (David N. Welton) wrote: > > Hi, I have web pages for my site, dedasys.com in both English and > > Italian, using files like index.html.it and index.html.en. Google > > doesn't seem to be caching the Italian version of the page though, > Idiots at work! The simpletons at Google index only documents with > certain "extensions". For example, ".htm8" ".mac" ".win" ending > documents are not indexed even if the type is text/html. > http://www.google.com/groups?th=7f00a97b04fa2509 The Google cretins > cannot understand the concept of MIME types. Ah, so my thesis was correct. Thanks for confirming! > BTW: I suggest reading http://www.cs.tut.fi/~jkorpela/flags.html Yes, I know and agree with it. Actually I just leave the flag there so that people concerned with these things will be bothered by it;-) -- David N. Welton Consulting: http://www.dedasys.com/ Personal: http://www.dedasys.com/davidw/ Free Software: http://www.dedasys.com/freesoftware/ Apache Tcl: http://tcl.apache.org/ |
|
|||
|
On Thu, 23 Oct 2003, David N. Welton wrote:
> Hi, I have web pages for my site, dedasys.com in both English and > Italian, using files like index.html.it and index.html.en. > > Google doesn't seem to be caching the Italian version of the page > though, after a month or two of it being up, so I suspect that the > extra .it (it's linked directly from the English page that way) may be > confusing it... You've already got an answer to that, but it should be no big deal to re-organise the naming to foobar.en.html, foobar.it.html, etc., should it? > If this is actually what google does, it should probably be publicized > (documented in apache's docs, for starters) so that people don't run > into the same problem. More to the point, perhaps Google could be persuaded to catch up with specifications. I wouldn't really want to see the Apache documentation itself bogged-down with short-term workarounds for failures by others to conform to published interworking specifications. cheers (yes, if you've found my web page on the topic - cited by the Apache tutorials - you can well imagine I'm considering adding a mention of this issue.) |
|
|||
|
"Alan J. Flavell" <flavell@ph.gla.ac.uk> writes:
> You've already got an answer to that, but it should be no big deal > to re-organise the naming to foobar.en.html, foobar.it.html, etc., > should it? No, already did it, but still... I wanted to be sure, and I think it's worth warning people about. > > If this is actually what google does, it should probably be > > publicized (documented in apache's docs, for starters) so that > > people don't run into the same problem. > More to the point, perhaps Google could be persuaded to catch up > with specifications. I wouldn't really want to see the Apache > documentation itself bogged-down with short-term workarounds for > failures by others to conform to published interworking > specifications. Someone mentioned that they have a FAQ... I don't know, it can't be something they haven't thought of. > (yes, if you've found my web page on the topic - cited by the Apache > tutorials - you can well imagine I'm considering adding a mention of > this issue.) Nope, sorry, wasn't aware of it. I'm pretty familiar with Apache:-) -- David N. Welton Consulting: http://www.dedasys.com/ Personal: http://www.dedasys.com/davidw/ Free Software: http://www.dedasys.com/freesoftware/ Apache Tcl: http://tcl.apache.org/ |
|
|||
|
On Sun, 26 Oct 2003, David N. Welton wrote:
> "Alan J. Flavell" <flavell@ph.gla.ac.uk> writes: > > > You've already got an answer to that, but it should be no big deal > > to re-organise the naming to foobar.en.html, foobar.it.html, etc., > > should it? > > No, already did it, but still... I wanted to be sure, and I think it's > worth warning people about. Certainly. I'm afraid my reply might not have been worded too carefully: I hadn't meant it to be anything worse than informative - it was not at all intended as any criticism of what you had done. > Someone mentioned that they have a FAQ... I don't know, it can't be > something they haven't thought of. For Apache 1.3 the nearest would seem to be http://httpd.apache.org/docs/misc/FAQ.html#multiviews and its link to http://httpd.apache.org/docs/content-negotiation.html For 2.0 you're at http://httpd.apache.org/docs-2.0/faq/all_in_one.html but there's not a lot there yet. You'd be better off with http://httpd.apache.org/docs-2.0/con...gotiation.html Neither version of the content-negotiation.html page addresses the specific issue that you raised, though. > > (yes, if you've found my web page on the topic - cited by the Apache > > tutorials - you can well imagine I'm considering adding a mention of > > this issue.) > > Nope, sorry, wasn't aware of it. I'm pretty familiar with Apache:-) I've now added a short sub-section at the end of http://ppewww.ph.gla.ac.uk/~flavell/www/lang-neg.html about this issue, though I suspect that - like all issues concerned with the details of search engines - the situation will change so fast that I'll be unable to provide accurate, up to date, information, so I've kept the description rather general and tentative. |