google vs content negotiation (foo.html.it, foo.html.pt, etc)

This is a discussion on google vs content negotiation (foo.html.it, foo.html.pt, etc) within the Linux Web Servers forums, part of the Web Server and Related Forums category; Hi, I have web pages for my site, dedasys.com in both English and Italian, using files like index.html....


Go Back   Usenet Forums > Web Server and Related Forums > Linux Web Servers

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 10-23-2003
David N. Welton
 
Posts: n/a
Default google vs content negotiation (foo.html.it, foo.html.pt, etc)


Hi, I have web pages for my site, dedasys.com in both English and
Italian, using files like index.html.it and index.html.en.

Google doesn't seem to be caching the Italian version of the page
though, after a month or two of it being up, so I suspect that the
extra .it (it's linked directly from the English page that way) may be
confusing it...

Anyone else observed this, or know more about it?

If this is actually what google does, it should probably be publicized
(documented in apache's docs, for starters) so that people don't run
into the same problem.

Thanks for your time,
--
David N. Welton
Consulting: http://www.dedasys.com/
Personal: http://www.dedasys.com/davidw/
Free Software: http://www.dedasys.com/freesoftware/
Apache Tcl: http://tcl.apache.org/
  #2 (permalink)  
Old 10-23-2003
Andreas Prilop
 
Posts: n/a
Default Re: google vs content negotiation (foo.html.it, foo.html.pt, etc)


davidw@dedasys.com (David N. Welton) wrote:

> Hi, I have web pages for my site, dedasys.com in both English and
> Italian, using files like index.html.it and index.html.en.
> Google doesn't seem to be caching the Italian version of the page
> though,


Idiots at work! The simpletons at Google index only documents with
certain "extensions". For example, ".htm8" ".mac" ".win" ending
documents are not indexed even if the type is text/html.
http://www.google.com/groups?th=7f00a97b04fa2509
The Google cretins cannot understand the concept of MIME types.

BTW: I suggest reading http://www.cs.tut.fi/~jkorpela/flags.html

--
http://www.unics.uni-hannover.de/nhtcapri/plonk.txt
  #3 (permalink)  
Old 10-23-2003
David N. Welton
 
Posts: n/a
Default Re: google vs content negotiation (foo.html.it, foo.html.pt, etc)

Andreas Prilop <nhtcapri@rrzn-user.uni-hannover.de> writes:

> davidw@dedasys.com (David N. Welton) wrote:


> > Hi, I have web pages for my site, dedasys.com in both English and
> > Italian, using files like index.html.it and index.html.en. Google
> > doesn't seem to be caching the Italian version of the page though,


> Idiots at work! The simpletons at Google index only documents with
> certain "extensions". For example, ".htm8" ".mac" ".win" ending
> documents are not indexed even if the type is text/html.
> http://www.google.com/groups?th=7f00a97b04fa2509 The Google cretins
> cannot understand the concept of MIME types.


Ah, so my thesis was correct. Thanks for confirming!

> BTW: I suggest reading http://www.cs.tut.fi/~jkorpela/flags.html


Yes, I know and agree with it. Actually I just leave the flag there
so that people concerned with these things will be bothered by it;-)

--
David N. Welton
Consulting: http://www.dedasys.com/
Personal: http://www.dedasys.com/davidw/
Free Software: http://www.dedasys.com/freesoftware/
Apache Tcl: http://tcl.apache.org/
  #4 (permalink)  
Old 10-23-2003
Alan J. Flavell
 
Posts: n/a
Default Re: google vs content negotiation (foo.html.it, foo.html.pt, etc)

On Thu, 23 Oct 2003, David N. Welton wrote:

> Hi, I have web pages for my site, dedasys.com in both English and
> Italian, using files like index.html.it and index.html.en.
>
> Google doesn't seem to be caching the Italian version of the page
> though, after a month or two of it being up, so I suspect that the
> extra .it (it's linked directly from the English page that way) may be
> confusing it...


You've already got an answer to that, but it should be no big deal to
re-organise the naming to foobar.en.html, foobar.it.html, etc., should
it?

> If this is actually what google does, it should probably be publicized
> (documented in apache's docs, for starters) so that people don't run
> into the same problem.


More to the point, perhaps Google could be persuaded to catch up with
specifications. I wouldn't really want to see the Apache
documentation itself bogged-down with short-term workarounds for
failures by others to conform to published interworking
specifications.

cheers

(yes, if you've found my web page on the topic - cited by the Apache
tutorials - you can well imagine I'm considering adding a mention of
this issue.)
  #5 (permalink)  
Old 10-26-2003
David N. Welton
 
Posts: n/a
Default Re: google vs content negotiation (foo.html.it, foo.html.pt, etc)

"Alan J. Flavell" <flavell@ph.gla.ac.uk> writes:

> You've already got an answer to that, but it should be no big deal
> to re-organise the naming to foobar.en.html, foobar.it.html, etc.,
> should it?


No, already did it, but still... I wanted to be sure, and I think it's
worth warning people about.

> > If this is actually what google does, it should probably be
> > publicized (documented in apache's docs, for starters) so that
> > people don't run into the same problem.


> More to the point, perhaps Google could be persuaded to catch up
> with specifications. I wouldn't really want to see the Apache
> documentation itself bogged-down with short-term workarounds for
> failures by others to conform to published interworking
> specifications.


Someone mentioned that they have a FAQ... I don't know, it can't be
something they haven't thought of.

> (yes, if you've found my web page on the topic - cited by the Apache
> tutorials - you can well imagine I'm considering adding a mention of
> this issue.)


Nope, sorry, wasn't aware of it. I'm pretty familiar with Apache:-)

--
David N. Welton
Consulting: http://www.dedasys.com/
Personal: http://www.dedasys.com/davidw/
Free Software: http://www.dedasys.com/freesoftware/
Apache Tcl: http://tcl.apache.org/
  #6 (permalink)  
Old 10-27-2003
Alan J. Flavell
 
Posts: n/a
Default Re: google vs content negotiation (foo.html.it, foo.html.pt, etc)

On Sun, 26 Oct 2003, David N. Welton wrote:

> "Alan J. Flavell" <flavell@ph.gla.ac.uk> writes:
>
> > You've already got an answer to that, but it should be no big deal
> > to re-organise the naming to foobar.en.html, foobar.it.html, etc.,
> > should it?

>
> No, already did it, but still... I wanted to be sure, and I think it's
> worth warning people about.


Certainly. I'm afraid my reply might not have been worded too
carefully: I hadn't meant it to be anything worse than informative -
it was not at all intended as any criticism of what you had done.

> Someone mentioned that they have a FAQ... I don't know, it can't be
> something they haven't thought of.


For Apache 1.3 the nearest would seem to be
http://httpd.apache.org/docs/misc/FAQ.html#multiviews
and its link to http://httpd.apache.org/docs/content-negotiation.html

For 2.0 you're at http://httpd.apache.org/docs-2.0/faq/all_in_one.html
but there's not a lot there yet. You'd be better off with
http://httpd.apache.org/docs-2.0/con...gotiation.html

Neither version of the content-negotiation.html page addresses the
specific issue that you raised, though.

> > (yes, if you've found my web page on the topic - cited by the Apache
> > tutorials - you can well imagine I'm considering adding a mention of
> > this issue.)

>
> Nope, sorry, wasn't aware of it. I'm pretty familiar with Apache:-)


I've now added a short sub-section at the end of
http://ppewww.ph.gla.ac.uk/~flavell/www/lang-neg.html about this
issue, though I suspect that - like all issues concerned with the
details of search engines - the situation will change so fast that
I'll be unable to provide accurate, up to date, information, so I've
kept the description rather general and tentative.
 
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT +1. The time now is 07:36 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.0.0