W3C home > Mailing lists > Public > www-archive@w3.org > August 2008

Re: [whatwg] Creative Commons Rights Expression Language

From: Dan Brickley <danbri@danbri.org>
Date: Mon, 25 Aug 2008 11:22:18 +0200
Message-ID: <48B279CA.9040807@danbri.org>
To: Kristof Zelechovski <giecrilj@stegny.2a.pl>
Cc: 'Julian Reschke' <julian.reschke@gmx.de>, 'Ben Adida' <ben@adida.net>, 'Ian Hickson' <ian@hixie.ch>, "'Bonner, Matt'" <matt.bonner@hp.com>, "'Tab Atkins Jr.'" <jackalmage@gmail.com>, 'Henri Sivonen' <hsivonen@iki.fi>, www-archive@w3.org

Kristof Zelechovski wrote:
> It is not metadata vs data, it is metadata vs content.  Data in HTML
> documents go into the SCRIPT element and they are usually expected to be
> private to the page.
> Chris

There's a significant body of work and thought around microformats (see 
below) that argues against keeping a separate and hidden pot of 
[meta]data. And in RDF land, we've found time and again that the 
distinction between so-called "metadata" and "data" is one that serves 
largely to confuse.

Re content vs [meta]data and microformats, eg. see 
http://tantek.com/log/2005/06.html#d03t2359  via 
http://microformats.org/wiki/principles
[[
One of the principles of microformats is to be presentable and parsable. 
This means we prefer visible data to invisible metadata. This is one of 
the lessons we learned from the meta keywords debacle.

In the early days of HTML, authors used to place keywords for their 
pages in an invisible <meta> tag and search engines used this 
information, because the specifications said to do so. However, before 
long, in the realm of the Wild Wild Web, these meta keywords fell out of 
sync with the content on pages, were polluted, spammed, and otherwise 
abused until there was so much noise, any semblance of signal was lost. 
Along came a new search engine that ignored meta keywords, used visible 
hyperlinks instead, and instantly provided better results than all other 
existing search engines.

Lesson learned: hyperlinks, being visible by default, proved more 
reliable and persistently accurate for many reasons. Authors readily saw 
mistakes themselves and corrected them (because presentation matters). 
Readers informed authors of errors the authors missed, which were again 
corrected. This feedback led to an implied social pressure to be more 
accurate with hyperlinks thus encouraging authors to more often get it 
right the first time. When authors/sites abused visible hyperlinks, it 
was obvious to readers, who then took their precious attention somewhere 
else. Visible data like hyperlinks with the positive feedback loop of 
user/market forces encouraged accuracy and accountability. This was a 
stark contrast from the invisible metadata of meta keywords, which, 
lacking such a positive feedback loop, through the combination of gaming 
incentives and natural entropy, deteriorated into useless noise.
]]


In the RDF scene, many agree with the core claim here: data that is not 
used, rots. We RDFish people perhaps tend to take a broader notion re 
use, and allow that the data might live primarily eg. in a database or 
app, with its expression in HTML markup being a downstream copy. So, for 
example, FOAF files that are generated automatically from a "social 
network" site are vastly more likely to be up to date than FOAF files 
that are hand edited or were created by one-shot tools like 
foaf-a-matic. The core data might live in the social network site's db 
rather than in HTML, but the principle here is that data that's un-used 
and un-seen by humans is unlikely to be kept accurate. Data embedded in 
real life activity is much healthier.

The Microformat view tends towards putting data in human-readable blocks 
  of markup as a way of keeping it visible and alive. The RDF community 
tends more towards making sure it can be consumed by multiple tools, so 
that it is "seen" and consumed widely. Both generally agree that the 
head section of an HTML document isn't usually the healthiest place to 
store and manage [meta]data.

cheers,

Dan

--
http://danbri.org/
Received on Monday, 25 August 2008 09:23:06 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:33:32 UTC