- From: Rob van Eijk <rob@blaeu.com>
- Date: Tue, 11 Dec 2012 16:29:26 +0100
- To: <public-privacy@w3.org>
Hi Thomas, The functional requirement to not be indexed can already be accomplished with a robots.txt. However, for content indexed it is more difficult. Sitemap.xml has the functionality to indicate when a robot should revisit (if I am correct). It would strengthen the protocol however is on a granular level, it would be possible to indicate the retention time of a specific content element. That is the functionality I am interested in. I would like to accomplish two actions: first, on the webmaster side, adding meta data to the content, which signal data subject’s wish to the outer world (e.g. expiration date, or do-not-index, etc.) and second, extending the functionalities of existing protocols in order to implement more standardized data access rules for external parties (search engines in primis). Rob Thomas Roessler schreef op 2012-12-11 16:08: > Rob, > > I think it'd be useful to take a step back and say explicitly which > particular instance of the right to be forgotten you're trying to > implement here. > > What are the requirements that you're trying to address? > > Thanks, > -- > Thomas Roessler, W3C <tlr@w3.org> (@roessler) > > > > On 2012-12-11, at 15:17 +0100, Rob van Eijk <rob@blaeu.com> wrote: > >> >> Dear all, >> >> I am looking for feedback when it comes to the right to be forgotten >> in the domain of search engines. The challenge for the concept of the >> right to be forgotten is IMHO to add meta data to specific part of the >> content on websites. Adding meta data with XML can be used to >> accomplish that goal. I would like to draw your attention to the >> Sitemap.XML file. It looks like: >> >> <url> >> >> <loc>http://www.voorbeeld.nl/papers/right-to-be-forgotten.html</loc> >> <lastmod>2005-01-01</lastmod> >> <changefreq>monthly</changefreq> >> <priority>0.8</priority> >> </url> >> >> The Sitemap.XML protocol can be extended when it comes to data >> retention. For instance by adding an expiration header <Retention>30 >> days</Retention> or <Retention>2013-12-31</Retention>. This meta data >> can be tied to a specific URL, in the case of the example above >> <loc>http://www.example.org/papers/right-to-be-forgotten.html</loc>. >> The application of metadata is not limited to HTML-pages, but can also >> be used for audio, pictures, video etc. Often, sitemaps are >> dynamically generated by the content management system. From a >> programmers perspective it is not difficult to enhance the module that >> generates the sitemap.xml. Also if one wishes to add metadata to >> existing sitemap.xml files outside of a content management system, >> adding the metadata with a scheduled script is also not a difficult >> task for a programmer. >> >> I think it is safe to say that a technical recommendation based on >> adding an expiration header to the sitemap.xml file makes sense and is >> useful. >> >> Proposed text: >> Adding an expiration header to the Sitemap may be an elegant way to >> handle data retention policies for individual data elements on a >> website. In order to make data retention enhanced Sitemap.XML files >> efficient, two stakeholders need to be on the same page: >> • Webmasters MAY consider the use of Sitemap.XML to add expiration >> headers to the content they are offering. These headers are an >> indication of data retention periods for specific parts of a site and >> may include deep links to HTML-pages, but can also be used for audio, >> pictures, video. >> • Search engines MUST honour expiration headers in Sitemap.XML files, >> and delete the search results accordingly. This includes the removal >> from any search cache. >> >> XML schema for the enhanced Sitemap protocol (Sitemap.xsd): >> >> <xsd:simpleType name="tRetention"> >> <xsd:annotation> >> <xsd:documentation> >> OPTIONAL: Indicates the data retention time of a particular URL. >> The value "always" should be used to describe >> content that should not be removed. The value "dateTime" should >> be used to indicate the maximum date after which the content can be >> removed from search result >> and search cache. Please note that web crawlers may not >> necessarily crawl pages marked "always" more often. >> </xsd:documentation> >> </xsd:annotation> >> <xsd:restriction base="xsd:string"> >> <xsd:enumeration value="always"/> >> <xsd:enumeration value="dateTime"/> >> </xsd:restriction> >> </xsd:simpleType> >> >> Sitemap protocol format consisting of XML tags (Sitemap.xml): >> >> <?xml version="1.0" encoding="UTF-8"?> >> <urlset xmlns="http://www.site.com/schemas/sitemap/"> >> <url> >> >> <loc>http://www.voorbeeld.nl/papers/right-to-be-forgotten.html</loc> >> <lastmod>2005-01-01</lastmod> >> <changefreq>monthly</changefreq> >> <priority>0.8</priority> >> <retention>2013-12-31</retention> >> </url> >> </urlset> >> >> Please let me know if this approach is of use. If so, I would like to >> learn where to address the problem in the standardization landscape: >> is there a IETF workgroup or a W3C workgroup? >> >> Kind regard, >> Rob >> >> >>
Received on Tuesday, 11 December 2012 15:30:00 UTC