Re: Microdata to RDF: First Editor's Draft (ACTION-6) from Jeni Tennison on 2011-10-16 (public-html-data-tf@w3.org from October 2011)

From: Jeni Tennison <jeni@jenitennison.com>
Date: Sun, 16 Oct 2011 07:52:00 +0100
To: KANZAKI Masahide <mkanzaki@gmail.com>, Gregg Kellogg <gregg@kellogg-assoc.com>, Martin Hepp <martin.hepp@ebusiness-unibw.org>
Cc: public-html-data-tf@w3.org
Message-Id: <63E5EE9A-7835-44C3-B3BA-FF4600A96092@jenitennison.com>

On 16 Oct 2011, at 02:32, KANZAKI Masahide wrote:
> 2011/10/14 Gregg Kellogg <gregg@kellogg-assoc.com>:
>> Note, on a separate list [7] that mfhepp worried about the use of collections at all, as that does not allow appropriate Good Relations mappings. We really need to solicit more input on the whole notion of preserving order through collections when deriving RDF from Microdata.
> 
> I agree with mfhepp, that mapping to RDF Collection doesn't seem to be
> good idea.
> 
> Most RDF models do not expect collection for multiple values. For
> example, when a bibliography page has multiple subjects, generating
> rdf:List as object of dc:subject would upset librarians. Same for
> tags, skos:altLabels, etc.
> 
> Also, most RDF properties do not have rdf:List as their range, thus
> generating collection would result in non conforming (or
> contradicting) triples.

The rationale for always mapping to a collection when there are multiple values is that it preserves the order in case it *is* needed. It's easy(ish) to remove ordering if you don't want it; impossible to reinstate order once it's been lost if you. See [1] for more detailed reasoning.

But what about rather than assuming a generic parse followed by some post-processing, if we explicitly left it up to implementations of the algorithm? We could say that each of the various things where knowledge of the vocabulary would make you do things differently is implementation-defined within particular constraints. So we would have something like:

  * the _property_URI_creation_method_ is one of X, Y or Z (TBD) and is implementation defined
  * the _datatype_ for a literal value is implementation defined
  * the _multi-value_mapping_ is either _to_a_collection_ or _to_multiple_statements_ and is implementation defined

Implementations themselves would then be free to use whatever method was suitable for them to determine how to set each of these, which might include some combination of:

  * having hard-coded knowledge of particular vocabularies
  * looking up what to do from a registry
  * working out what to do based on a schema or ontology
  * having some fixed defaults that will work in 99% of cases

This would provide enough framework that individual implementations didn't each have to reinvent how to do everything, but the ability to insert vocabulary knowledge early in the process and a guarantee (by making it implementation defined rather than implementation determined) that the users of a tool will be informed about the tool's behaviour.

What do you think? Would this work as an approach?

Jeni

[1] http://lists.w3.org/Archives/Public/public-html-data-tf/2011Oct/0015.html
-- 
Jeni Tennison
http://www.jenitennison.com

Received on Sunday, 16 October 2011 06:52:30 UTC