Re: Transparent Content Negotiation & Search Engine Indexation.

On Fri, 5 Jan 2001, Vincent-Olivier Arsenault wrote:

> Hi all,
> 
> This message has been posted to the w3's list and to the technical support 
> for different major search engines. Please reply to the list (I will commit 
> the messages send only to me).
> 
> Simply put, let say that http://www.name.com is a pointer to 8 different 
> document (4 locales: en-us, en-ca, fr-ca, es-us. 2 document type: html, wml).
> 
> There is no translation, the content for the locales is COMPLETELY unrelated.
> 
> There is a link to switch to the 3 other locales on all the pages (both wml 
> and html).
> 
> The transparent content negotiation is based on the accept* and the 
> user-agent HTTP headers.
> 
> If there is no match in the available permutations (locales x dtd), or if 
> there is nothing specified, the default locale is en-us and the default DTD 
> is html.
> 
> Here are my questions:
> 
> Q1: Does the indexing robots make use of the content negotiation parameters 
> (http headers)? If so, in what way?
> 
> Q2: Does the search engines return result URIs from documents that were 
> obtained with a robot using content negotiation parameters (http headers) 
> that match those of the user?
> ie: Would a spanish user get http://www.name.com  (not 
> http://www.name.com/index.es.html, the link to the spanish section from 
> other locales) for keywords on the spanish version?

I am not a robot author but I did work on content negotiation standards.

For Q1, my guess here is that robots do not currently use much of the
content negotiation parameters, if at all. Negotiation can be done at the
server in many ways, and in general there is no reliable way in which a
robot could make sense of all cases. The transparent content negotiation
specification does have a mechanism (the Alternates response header) which
would allow a robot to learn a lot about the structure of the negotiated
content, but servers will not always produce this header.  If negotiation
is done with mod_negotiation in newer version of Apache, the robot *can*
get the Alternates header if it wants to, I don't know if any current
robots use this facility.  Mod_negotiation does not currently support
user-agent header based negotiation so for many advanced cases of
negotiation I expect there is still a lot of hand-coding in servers,
without the option of getting an alternates header. All in all my answer
to Q2 is: the engine will probably return the link to the spanish section,
not the http://www.name.com link.

> 
> thanks,
> 
> vincent

Koen.

Received on Tuesday, 9 January 2001 05:51:48 UTC