- From: Danny Ayers <danny.ayers@gmail.com>
- Date: Wed, 8 Aug 2007 10:39:44 +0200
- To: public-grddl-wg <public-grddl-wg@w3.org>
I brought this up a while ago, I think it may be relevant again in the context of "Bootstrapping...". It seems like the majority of docs containing microformat data are not likely to have an @profile, and possibly not be XHTML either. It still means RDF data may be extracted from these docs following various heuristics. This goes beyond GRDDL, but that doesn't prevent us from being pragmatic in the interests of bootstrapping the semweb. But if the extracted RDF is made available on the web (possibly dynamically through an online service), the publisher's license for that interpretation can't be infered through GRDDL. There seem to be two primary ways of maximising reuse of data in these circumstances: * the publisher (/converter) of the RDF assumes responsibility * the publisher (/converter) of the RDF provides a description of the provenance chain The first is essentially what we have by default. But what I don't like about it is that it the RDF publisher has more information about the source data (how they got RDF from it), but that information typically won't be available. So I reckon the second may be worth exploring, the data may be more suitable for reuse. Compare: a) "this data is provided by http://dannyayers.com/my-hacky-scaper" b) "this data is provided by the detection of microformats according to microformats.org, the assumption of the hcard & hcal profiles and subsequent application of GRDDL" Another possibility may be the application of out-of-band profiles. DocumentX may not include a profile itself, but the statement may be made in a third-party doc. A combination of these approaches might be the definition of super-profiles that include the heuristics - GRDDL(Tidy-plus-hCard(documentX)) = result At the minimum it might be useful to know, alongside any heuristically-derived triples something like: <http://example.org/documentX> a MicroformatDocument ; mediaType "text/html" . Cheers, Danny. -- http://dannyayers.com
Received on Wednesday, 8 August 2007 08:39:53 UTC