- From: Rob van Eijk <rob@blaeu.com>
- Date: Tue, 11 Dec 2012 15:17:30 +0100
- To: <public-privacy@w3.org>
Dear all,
I am looking for feedback when it comes to the right to be forgotten in
the domain of search engines. The challenge for the concept of the right
to be forgotten is IMHO to add meta data to specific part of the content
on websites. Adding meta data with XML can be used to accomplish that
goal. I would like to draw your attention to the Sitemap.XML file. It
looks like:
<url>
<loc>http://www.voorbeeld.nl/papers/right-to-be-forgotten.html</loc>
<lastmod>2005-01-01</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
The Sitemap.XML protocol can be extended when it comes to data
retention. For instance by adding an expiration header <Retention>30
days</Retention> or <Retention>2013-12-31</Retention>. This meta data
can be tied to a specific URL, in the case of the example above
<loc>http://www.example.org/papers/right-to-be-forgotten.html</loc>. The
application of metadata is not limited to HTML-pages, but can also be
used for audio, pictures, video etc. Often, sitemaps are dynamically
generated by the content management system. From a programmers
perspective it is not difficult to enhance the module that generates the
sitemap.xml. Also if one wishes to add metadata to existing sitemap.xml
files outside of a content management system, adding the metadata with a
scheduled script is also not a difficult task for a programmer.
I think it is safe to say that a technical recommendation based on
adding an expiration header to the sitemap.xml file makes sense and is
useful.
Proposed text:
Adding an expiration header to the Sitemap may be an elegant way to
handle data retention policies for individual data elements on a
website. In order to make data retention enhanced Sitemap.XML files
efficient, two stakeholders need to be on the same page:
• Webmasters MAY consider the use of Sitemap.XML to add expiration
headers to the content they are offering. These headers are an
indication of data retention periods for specific parts of a site and
may include deep links to HTML-pages, but can also be used for audio,
pictures, video.
• Search engines MUST honour expiration headers in Sitemap.XML files,
and delete the search results accordingly. This includes the removal
from any search cache.
XML schema for the enhanced Sitemap protocol (Sitemap.xsd):
<xsd:simpleType name="tRetention">
<xsd:annotation>
<xsd:documentation>
OPTIONAL: Indicates the data retention time of a particular URL.
The value "always" should be used to describe
content that should not be removed. The value "dateTime" should
be used to indicate the maximum date after which the content can be
removed from search result
and search cache. Please note that web crawlers may not
necessarily crawl pages marked "always" more often.
</xsd:documentation>
</xsd:annotation>
<xsd:restriction base="xsd:string">
<xsd:enumeration value="always"/>
<xsd:enumeration value="dateTime"/>
</xsd:restriction>
</xsd:simpleType>
Sitemap protocol format consisting of XML tags (Sitemap.xml):
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.site.com/schemas/sitemap/">
<url>
<loc>http://www.voorbeeld.nl/papers/right-to-be-forgotten.html</loc>
<lastmod>2005-01-01</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
<retention>2013-12-31</retention>
</url>
</urlset>
Please let me know if this approach is of use. If so, I would like to
learn where to address the problem in the standardization landscape: is
there a IETF workgroup or a W3C workgroup?
Kind regard,
Rob
Received on Tuesday, 11 December 2012 14:18:06 UTC