- From: by way of Martin Duerst <k.holtman@chello.nl>
- Date: Thu, 02 Sep 2004 07:16:27 +0900
- To: www-international@w3.org
On Wed, 1 Sep 2004, Reto Bachmann-Gmuer wrote: > > > hello Hi Reto, > > I'm wondering how search engines should handle pages with language > negotiation where the different laguage versions of a page have only one > url. > > A way for a search engine would be the following: > - the 1st request with all the handled languages with different q-values > in the accept-language-header. > e.g. > Accept-Language: rm; q=1, es; q=.99, de; q=.98, fr; q=.97, en; q=.95 > (If a search engine wants so support all 137-iso languages this header > becomes quite long, not to mention language variants) > > - the second request accepts all languages except the language in which > the first request has been answered and all languages that had a higher > q-value than this one in the previous request. Repeat this until the > server returns a language-version that has already been returned before > or the list of remaining accept-languages is empty. > e.g. > When the first request has benn answered with a resource in german (de), > the socond request would be: > Accept-Language: fr; q=1, en; q=.99 > > To reduce the number of requests necessary more seldomly available > languages should have higher q-values in the http-request. > > The disadvantage of this solution is that many resources have to be > requested more times than necessary, are there better solutions? > Wouldn't it be useuful to have a http-response-header indicating all > available languages? For a very general case, the above solution is probably a good (but expensive) one if you really want to discover all languages. However, it is not guaranteed to work for all webservers, as webservers are free to do `illogical' things like ignore the exact contents (including quality values) of the accept header you send. You are also asking if there are useful http response headers. There are. But you might not get them by default. Basically, web servers that implement (at least some parts of) transparent content negotiation (rfc2295) will include various mechanisms that are helpful for the search engine. The Apache server does implement transparent content negotiation for language variants. Of course page authors are not required to use the apache module in question, they can craft their own non-transparent language negotiation system if they want, but in general they won't. To give an example of how the transparent content negotiation mechanisms help search engines, a search engine could send: GET / HTTP/1.0 Host: www.debian.org Negotiate: vlist (so with a `negotiate' header field from frc2295) and this would yield HTTP/1.1 300 Multiple Choices Date: Wed, 01 Sep 2004 20:34:53 GMT Server: Apache/1.3.26 (Unix) Debian GNU/Linux PHP/4.1.2 Alternates: {"index.ar.html" 1 {type text/html} {language ar} {length 17902}}, {"index.bg.html" 1 {type text/html} {language bg} {length 19560}}, {"index.ca.html" 1 {type text/html} {language ca} {length 16852}}, {"index.cs.html" 1 {type text/html} {language cs} {length 16962}}, {"index.da.html" 1 {type text/html} {language da} {length 16607}}, {"index.de.html" 1 {type text/html} {language de} {length 17217}}, {"index.el.html" 1 {type text/html} {language el} {length 17052}}, {"index.en-gb.html" 1 {type text/html} {language en-gb} {length 16726}}, {"index.en-us.html" 1 {type text/html} {language en-us} {length 16726}}, {"index.en.html" 1 {type text/html} {language en} {length 16726}}, {"index.eo.html" 1 {type text/html} {language eo} {length 16684}}, {"index.es.html" 1 {type text/html} {language es} {length 17210}}, {"index.fi.html" 1 {type text/html} {language fi} {length 16490}}, <etc, etc> with an Alternates header that is very useful for the search engine. Servers or URLs that do not implement transparent content negotiation can still return Alternates headers to give hints to search engines (even if the language variants are not available under different URLs, as is required by transparent content negotiation, but all just under one top level URL), but I doubt if that is used much, if at all. > > cheers, > reto Hope this helps, Koen.
Received on Thursday, 2 September 2004 05:04:24 UTC