Without @profile, who's license?

I brought this up a while ago, I think it may be relevant again in the
context of "Bootstrapping...".

It seems like the majority of docs containing microformat data are not
likely to have an @profile, and possibly not be XHTML either. It still
means RDF data may be extracted from these docs following various
heuristics.  This goes beyond GRDDL, but that doesn't prevent us from
being pragmatic in the interests of bootstrapping the semweb. But if
the extracted RDF is made available on the web (possibly dynamically
through an online service), the publisher's license for that
interpretation can't be infered through GRDDL.

There seem to be two primary ways of maximising reuse of data in these

* the publisher (/converter) of the RDF assumes responsibility
* the publisher (/converter) of the RDF provides a description of the
provenance chain

The first is essentially what we have by default. But what I don't
like about it is that it the RDF publisher has more information about
the source data (how they got RDF from it), but that information
typically won't be available. So I reckon the second may be worth
exploring, the data may be more suitable for reuse. Compare:

a) "this data is provided by http://dannyayers.com/my-hacky-scaper"

b) "this data is provided by the detection of microformats according
to microformats.org,  the assumption of the hcard & hcal profiles and
subsequent application of GRDDL"

Another possibility may be the application of out-of-band profiles.
DocumentX may not include a profile itself, but the statement may be
made in a third-party doc.

A combination of these approaches might be the definition of
super-profiles that include the heuristics -

GRDDL(Tidy-plus-hCard(documentX)) = result

At the minimum it might be useful to know, alongside any
heuristically-derived triples something like:

<http://example.org/documentX> a MicroformatDocument ;

mediaType "text/html" .




Received on Wednesday, 8 August 2007 08:39:53 UTC