Re: Without @profile, who's license? from Harry Halpin on 2007-08-08 (public-grddl-wg@w3.org from August 2007)

From: Harry Halpin <hhalpin@ibiblio.org>
Date: Wed, 08 Aug 2007 12:39:24 -0400
To: Danny Ayers <danny.ayers@gmail.com>
Cc: public-grddl-wg <public-grddl-wg@w3.org>
Message-ID: <46B9F1BC.80302@ibiblio.org>

Danny Ayers wrote:
> I brought this up a while ago, I think it may be relevant again in the
> context of "Bootstrapping...".
>
> It seems like the majority of docs containing microformat data are not
> likely to have an @profile, and possibly not be XHTML either. It still
> means RDF data may be extracted from these docs following various
> heuristics. 
There is only one "legal" GRDDL way - which would be to get HTML 5 to
have an XML Serialization, and put the GRDDL transforms for well-known
micformats at the HTML namespace document. I chatted with members of the
WG about the feasibility of this approach, and response was that it
would take the consensus of 100,000s to get anything added to the HTML
namespace document. I would be willing to send an e-mail on behalf of
the GRDDL WG to the HTML 5 WG (although DanC said he would abstain since
the GRDDL WG did not directly produce those transformations) asking them
to investigate this possibility.

>  This goes beyond GRDDL, but that doesn't prevent us from
> being pragmatic in the interests of bootstrapping the semweb. But if
> the extracted RDF is made available on the web (possibly dynamically
> through an online service), the publisher's license for that
> interpretation can't be infered through GRDDL.
>
> There seem to be two primary ways of maximising reuse of data in these
> circumstances:
>
> * the publisher (/converter) of the RDF assumes responsibility
> * the publisher (/converter) of the RDF provides a description of the
> provenance chain
>
> The first is essentially what we have by default. But what I don't
> like about it is that it the RDF publisher has more information about
> the source data (how they got RDF from it), but that information
> typically won't be available. So I reckon the second may be worth
> exploring, the data may be more suitable for reuse. Compare:
>
> a) "this data is provided by http://dannyayers.com/my-hacky-scaper"
>
> b) "this data is provided by the detection of microformats according
> to microformats.org,  the assumption of the hcard & hcal profiles and
> subsequent application of GRDDL"
>   
I have a funny feeling there should be some provenance OWL or RDF (maybe
Kagal's Rei, named graphs) vocabulary that should be attached to the
results of this hypothetical GRDDL service. Any more ideas about this
would be done would be great, and should be thought out just in case W3C
can allocate some resources to create a spec-client GRDDL service.
> Another possibility may be the application of out-of-band profiles.
> DocumentX may not include a profile itself, but the statement may be
> made in a third-party doc.
>
> A combination of these approaches might be the definition of
> super-profiles that include the heuristics -
>
> GRDDL(Tidy-plus-hCard(documentX)) = result
>
> At the minimum it might be useful to know, alongside any
> heuristically-derived triples something like:
>
> <http://example.org/documentX> a MicroformatDocument ;
>
> mediaType "text/html" .
>
> Cheers,
> Danny.
>
>   


-- 
  -harry

Harry Halpin,  University of Edinburgh 
http://www.ibiblio.org/hhalpin 6B522426

Received on Wednesday, 8 August 2007 16:39:34 UTC