Re: language negotiation and search engines

On Wed, 1 Sep 2004, Reto Bachmann-Gmuer wrote:

 >
 >
 > hello

Hi Reto,

 >
 > I'm wondering how search engines should handle pages with language
 > negotiation where the different laguage versions of a page have only one
 > url.
 >
 > A way for a search engine would be the following:
 > - the 1st request with all the handled languages with different q-values
 > in the accept-language-header.
 > e.g.
 > Accept-Language: rm; q=1, es; q=.99, de; q=.98, fr; q=.97, en; q=.95
 > (If a search engine wants so support all 137-iso languages this header
 > becomes quite long, not to mention language variants)
 >
 > - the second request accepts all languages except the language in which
 > the first request has been answered and all languages that had a higher
 > q-value than this one in the previous request. Repeat this until the
 > server returns a language-version that has already been returned before
 > or the list of remaining accept-languages is empty.
 > e.g.
 > When the first request has benn answered with a resource in german (de),
 > the socond request would be:
 > Accept-Language: fr; q=1, en; q=.99
 >
 > To reduce the number of requests necessary more seldomly available
 > languages should have higher q-values in the http-request.
 >
 > The disadvantage of this solution is that many resources have to be
 > requested more times than necessary, are there better solutions?
 > Wouldn't it be useuful to have a http-response-header indicating all
 > available languages?

For a very general case, the above solution is probably a good (but
expensive) one if you really want to discover all languages.  However, it
is not guaranteed to work for all webservers, as webservers are free to do
`illogical' things like ignore the exact contents (including quality
values) of the accept header you send.

You are also asking if there are useful http response headers.  There are.
But you might not get them by default.

Basically, web servers that implement (at least some parts of) transparent
content negotiation (rfc2295) will include various mechanisms that are
helpful for the search engine.  The Apache server does implement
transparent content negotiation for language variants.  Of course page
authors are not required to use the apache module in question, they can
craft their own non-transparent language negotiation system if they want,
but in general they won't.

To give an example of how the transparent content negotiation mechanisms
help search engines, a search engine could send:

GET / HTTP/1.0
Host: www.debian.org
Negotiate: vlist

(so with a `negotiate' header field from frc2295) and this would yield

HTTP/1.1 300 Multiple Choices
Date: Wed, 01 Sep 2004 20:34:53 GMT
Server: Apache/1.3.26 (Unix) Debian GNU/Linux PHP/4.1.2
Alternates: {"index.ar.html" 1 {type text/html} {language ar} {length
17902}}, {"index.bg.html" 1 {type text/html} {language bg} {length
19560}}, {"index.ca.html" 1 {type text/html} {language ca} {length
16852}}, {"index.cs.html" 1 {type text/html} {language cs} {length
16962}}, {"index.da.html" 1 {type text/html} {language da} {length
16607}}, {"index.de.html" 1 {type text/html} {language de} {length
17217}}, {"index.el.html" 1 {type text/html} {language el} {length
17052}}, {"index.en-gb.html" 1 {type text/html} {language en-gb} {length
16726}}, {"index.en-us.html" 1 {type text/html} {language en-us} {length
16726}}, {"index.en.html" 1 {type text/html} {language en} {length
16726}}, {"index.eo.html" 1 {type text/html} {language eo} {length
16684}}, {"index.es.html" 1 {type text/html} {language es} {length
17210}}, {"index.fi.html" 1 {type text/html} {language fi} {length
16490}}, <etc, etc>

with an Alternates header that is very useful for the search
engine.

Servers or URLs that do not implement transparent content negotiation can
still return Alternates headers to give hints to search engines (even if
the language variants are not available under different URLs, as is
required by transparent content negotiation, but all just under one top
level URL), but I doubt if that is used much, if at all.

 >
 > cheers,
 > reto

Hope this helps,

Koen.

Received on Thursday, 2 September 2004 05:04:24 UTC