W3C home > Mailing lists > Public > public-html-data-tf@w3.org > October 2011

Re: Proposal to resolve ISSUE-1

From: Gregg Kellogg <gregg@kellogg-assoc.com>
Date: Mon, 31 Oct 2011 15:33:05 -0400
To: Ivan Herman <ivan@w3.org>
CC: Gregg Kellogg <gregg@kellogg-assoc.com>, HTML Data Task Force WG <public-html-data-tf@w3.org>
Message-ID: <9039B108-D487-4049-97D3-145EAB4809A4@greggkellogg.net>
On Oct 31, 2011, at 2:04 AM, Ivan Herman wrote:

> Gregg,
> I am not sure I understand exactly what you mean by the various choices. Can you give examples with some well known vocabularies, like schema, dc, foaf, vcard, gr?

Looking at the examples in [2] might be useful. If we had the following in the registry:

  "http://purl.org/vocab/frbr/core#": {
    "propertyURI": "vocabulary"

With this input:

<dl itemscope
 <dd><cite itemprop="http://purl.org/dc/terms/title">Just a Geek</cite></dd>
 <dd><span itemprop="http://purl.org/dc/terms/creator">Wil Wheaton</span></dd>
 <dd itemprop="realization"
  <link itemprop="http://purl.org/dc/terms/type" href="http://purl.oreilly.com/product-types/BOOK">
 <dd itemprop="realization"
  <link itemprop="http://purl.org/dc/terms/type" href="http://purl.oreilly.com/product-types/EBOOK">

We would get the following Turtle:

@base <http://books.example.com/> .
@prefix dc: <http://purl.org/dc/terms/> .
@prefix md: <http://www.w3.org/1999/xhtml/microdata#> .
@prefix frbr: <http://purl.org/vocab/frbr/core#> .

<> md:item <works/45U8QJGZSQKDH8N> .

<works/45U8QJGZSQKDH8N> a frbr:Work ;
  dc:creator "Wil Wheaton"@en ;
  dc:title "Just a Geek"@en ;
  frbr:realization (
  ) .

<products/9780596007683.BOOK> a frbr:Expression ;
  dc:type <product-types/BOOK> .

<products/9780596802189.EBOOK> a frbr:Expression ;
  dc:type <product-types/EBOOK> .

> I am also concerned by the fact microdata->rdf converters would have to consult the registry (ok, it can be cached, but nevertheless) for each and every @itemtype. This may be prohibitive. Also, a default mechanism should be made available in case the registry is unreachable, and this default should correspond to the most frequent usage (my feeling is that the most frequent usage was the approach you originally had, namely that the vocabulary base URI can be deduced from the @itemtype URI by cutting it back from its last component).

If there was a way to meet the requirements without a registry, that would be great. Aside from the WHATWG HTML5 spec defining vocabularies which use the _contextual_ URI generation scheme, there's schema.org's expansion [3], which makes detecting the proper vocabulary more challenging. Basically, a type (or property) can be expanded by adding to it's URI path:

<http://schema.org/Person> could be a basic type, while <http://schema.org/Person/Deceased> could be an application-specific sub-type of Person. Without a registry, it would be more difficult to determine the proper URI prefix to use for the vocabulary.

Another thought I raised earlier was to do away with special URI processing rules and settle on one in particular (the previous, I would think). Then we could rely on pre-defined prefixes, equivalent to those defined in the RDFa default context, to allow for saner URI production. So for example, we could have the following:

<div itemscope itemtype="schema:Person/Deceased">
  <span itemprop="schema:name">Jane Doe</span>
  <img itemprop="schema:image" src="janedoe.jpg" />

Much simpler than requiring a registry. There's still the issue of multi-valued properties. I would support abandoning placing them in RDF Collections, and being out of conformance with Microdata ordering, which I think is much more what people want to use, but this could always be done with post-processing entailment rules such as the following:

sss ppp bbb . bbb rdf:rest rrr => sss ppp rrr .
sss ppp bbb . bbb rdf:first vvv => sss ppp vvv .

And then ignoring or removing the BNode values.


> Ivan 
> On Oct 28, 2011, at 19:23 , Gregg Kellogg wrote:
>> I am preparing an update to the Microdata to RDF specification. I propose we resolve ISSUE 1 as follows:
>> We define a registry mapping URI prefixes to property URI generation behavior with possible values of _vocabulary_, _type_, or _contextual_. @itemtypes which begin with URI prefix will use the associated value of property URI generation behavior when generating property URIs, and otherwise fall back to _contextual_.
>> We also add a mapping from a URI prefix to the mechanism for serializing multi-valued properties with possible values _unordered_ and _list_.
>> The format of the registry is undefined, as is the update process. I think this is really a fairly complicated issue, and probably beyond the scope of this TF.
>> (Note there is some debate on if "registry" is the proper term, I'm sticking with it for now).
>> For non-URI property names:
>> _vocabulary_ URI generation constructs a URI by appending fragment-escaped property names to the URI prefix.
>> _type_ URI generation constructs a URI by appending '#' and the fragment-escaped property name to the @itemtype URI. This is only valid for @itemtype URIs which do not, themselves, contain a fragment.
>> _contextual_ URI generation uses the original property URI generation algorithm from [1].
>> When generating triples for multi-valued properties, _subject_ and _predicate_ serialize the list of values as follows:
>> _unordered_ generates a triple with _subject_, _predicate_ and _value_ for each _value_ in the list of values.
>> _list_ generates an RDF Collection.
>> I'm marking the issue as PENDINGREVIEW.
>> Gregg
>> [1] http://www.w3.org/2011/htmldata/track/issues/1
[2] https://dvcs.w3.org/hg/htmldata/raw-file/74bd1c88b77d/microdata-rdf/index.html#markup-examples

> ----
> Ivan Herman, W3C Semantic Web Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> PGP Key: http://www.ivan-herman.net/pgpkey.html
> FOAF: http://www.ivan-herman.net/foaf.rdf
Received on Monday, 31 October 2011 19:36:51 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 19:58:05 UTC