Re: Managing Obsolete Information from Martin J. Dürst on 2012-09-06 (ietf-http-wg@w3.org from July to September 2012)

From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
Date: Thu, 06 Sep 2012 09:57:37 +0900
To: Karl Dubost <karld@opera.com>
CC: "ietf-http-wg@w3.org Group" <ietf-http-wg@w3.org>
Message-ID: <5047F501.2030608@it.aoyama.ac.jp>

What about pages/information that is partly outdated, but partly still 
up to speed? What about pages that are not exactly up-to-date, but also 
not completely outdated (i.e. better than nothing)? HTTP has the q 
parameter that may be able to deal with such gradations, but it's not 
exactly easy to use (which is not the q parameter's fault).

As for search engines, they try to use all kinds of hints to figure out 
whether a page is still up to speed, so there may not be such a big need 
from them for this (also because the chance that such information will 
be missing/wrong is high).

Regards,   Martin.

On 2012/09/06 2:03, Karl Dubost wrote:
> Maybe yet another silly question.
>
> Summary: How to manage the memory shoebox?
>           (or managing future obsolete information.)
>
> Not a proposal/question for HTTP/1.1, but for future version.
>
> It is a question which pops up often in between people on managing URIs on a site.
>
> Scenario:
> — Paul: "This documentation article is giving outdated information. Let's kill it."
> — Jane: "This information might have an historical interest. Let's not kill it."
>
> Then discussions start on the different ways to handle that. Some of the suggestions involve:
>
> * 410 Gone. Kill the content and ask clients to forget it.
> * 301 Permanent Redirect. Kill the "previous" content and the previous "URI", to the new content and new URI.
> * 301 Permanent Redirect. Move it to an archive.example.org web site and keep the old content.
> * 200 OK for browsers and personal user agent, but send a 410 Gone or 403 to search engines (Involving user agent sniffing). So the information is accessible but not indexable to avoid to pollute search results.
> * 200 OK with noindex in the markup works only for HTML
> * robots.txt for blocking certains URIs (with all the issues which go with robots.txt)
>
> I'm not sure if there is a right solution and maybe it's fine as it is. I would imagine sometimes a code in 2xx series could be sent back ala
>
> 2xx Obsolete or 2xx Archived
>
> It is different from Gone. It just says the information at this URI is here we can give it to you but note that it is not meant as something fresh. The clients coming to the site know their goals such as archiving bots for example. Some bots which are more focused on the new new shining things might decide to ignore this URI because it is obsolete and prefer to spend time indexing good fresh stuff. Bookmarks of people fond of old stuff will still be working.
>
> Note that there could be even a Location: sent along with the new fresh stuff is there. You may decide to go or stay here.
>
>

Received on Thursday, 6 September 2012 00:58:13 UTC