W3C home > Mailing lists > Public > public-html-data-tf@w3.org > October 2011

Re: Microdata to RDF: First Editor's Draft (ACTION-6)

From: Jeni Tennison <jeni@jenitennison.com>
Date: Sun, 16 Oct 2011 07:52:00 +0100
Cc: public-html-data-tf@w3.org
Message-Id: <63E5EE9A-7835-44C3-B3BA-FF4600A96092@jenitennison.com>
To: KANZAKI Masahide <mkanzaki@gmail.com>, Gregg Kellogg <gregg@kellogg-assoc.com>, Martin Hepp <martin.hepp@ebusiness-unibw.org>

On 16 Oct 2011, at 02:32, KANZAKI Masahide wrote:
> 2011/10/14 Gregg Kellogg <gregg@kellogg-assoc.com>:
>> Note, on a separate list [7] that mfhepp worried about the use of collections at all, as that does not allow appropriate Good Relations mappings. We really need to solicit more input on the whole notion of preserving order through collections when deriving RDF from Microdata.
> I agree with mfhepp, that mapping to RDF Collection doesn't seem to be
> good idea.
> Most RDF models do not expect collection for multiple values. For
> example, when a bibliography page has multiple subjects, generating
> rdf:List as object of dc:subject would upset librarians. Same for
> tags, skos:altLabels, etc.
> Also, most RDF properties do not have rdf:List as their range, thus
> generating collection would result in non conforming (or
> contradicting) triples.

The rationale for always mapping to a collection when there are multiple values is that it preserves the order in case it *is* needed. It's easy(ish) to remove ordering if you don't want it; impossible to reinstate order once it's been lost if you. See [1] for more detailed reasoning.

But what about rather than assuming a generic parse followed by some post-processing, if we explicitly left it up to implementations of the algorithm? We could say that each of the various things where knowledge of the vocabulary would make you do things differently is implementation-defined within particular constraints. So we would have something like:

  * the _property_URI_creation_method_ is one of X, Y or Z (TBD) and is implementation defined
  * the _datatype_ for a literal value is implementation defined
  * the _multi-value_mapping_ is either _to_a_collection_ or _to_multiple_statements_ and is implementation defined

Implementations themselves would then be free to use whatever method was suitable for them to determine how to set each of these, which might include some combination of:

  * having hard-coded knowledge of particular vocabularies
  * looking up what to do from a registry
  * working out what to do based on a schema or ontology
  * having some fixed defaults that will work in 99% of cases

This would provide enough framework that individual implementations didn't each have to reinvent how to do everything, but the ability to insert vocabulary knowledge early in the process and a guarantee (by making it implementation defined rather than implementation determined) that the users of a tool will be informed about the tool's behaviour.

What do you think? Would this work as an approach?


[1] http://lists.w3.org/Archives/Public/public-html-data-tf/2011Oct/0015.html
Jeni Tennison
Received on Sunday, 16 October 2011 06:52:30 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 20:08:24 UTC