W3C home > Mailing lists > Public > public-html-data-tf@w3.org > November 2011

Re: Updated Microdata to RDF spec

From: Ivan Herman <ivan@w3.org>
Date: Mon, 21 Nov 2011 10:58:53 +0100
Cc: HTML Data Task Force WG <public-html-data-tf@w3.org>
Message-Id: <7022E151-64CA-4C25-BE15-10C708DA1CC9@w3.org>
To: Gregg Kellogg <gregg@kellogg-assoc.com>

thanks for folding in all the issues!

However... I must admit I am quite unhappy with the current design due to the necessity to use a registry. Introduction of such a registry would, I believe, make the md->RDF conversion process way too complicated. We'd also get to the problem of a dependency on network for each and every conversion process, with all the consequence of efficiency and the necessity to design around network failure. These are exactly the reasons why the RDFa WG, for example, dropped the @profile idea a while ago, in spite of the elegance of that approach.

Also: from an RDF usage point of view (and this is clearly my concern), what you call the 'contextual' approach will rarely be used by the RDF community in my view. If I take the schema.org example, a URI of the form


would be unnecessarily complex. Even if URI-s are opaque in RDF, practical usage can be hindered by such URI-s.

So... we may have to accept that, in some cases, the md->RDF conversion is lossy. Lossy in the sense that it may not reflect, by default, all the intentions of the microdata design (you yourself make this note in the document on the section of "vocabulary"). 

To make things more specific, here are some thoughts that are, in my view, worth discussing.

1. Think about a "Conversion Lite" and "Conversion Full". "Conversion Lite" should be usable without any registry whatsover. We _may_ think about a "Conversion Full" for the few cases that are not working with Lite and we must reflect the original design, e.g., the "contextual" option. My personal expectation, which is of course  not proven at this point, that the need for 'Full' may be minor. (See the items below for the various defaults.)

2. The "Lite" version for the property URI generation should be "vocabulary". At this moment I do not see any real use case for "contextual" used out there on the Web, whereas "vocabulary" should work with, say, schema.org, which is, clearly, _the_ major use case for microdata or with the hcard example which would make it compatible to the current RDF mappings of cards.

3. As far as datatype generation is concerned, we should, mostly, keep away from that (certainly for "Lite"). Microdata does not care about datatypes, we should simply run with that and not try to outsmart microdata. The exception may be when a specific HTML element does define datatypes, like the <time> element. If the author cares about RDF Datatypes, he/she should use RDFa 1.1 Lite, whose complexity is comparable to that of microdata, ie, it would not make it more difficult to use. (Note that this is the same as the 'lang' issue. At the moment the md->RDF conversion does not care about language setting either.)

4. For the value ordering issue: what don't we do both? What I mean is: we can simply generate both the unordered

<> property "a", "b" .

_as well as_

<> property ("a" "b") .

triples. Yes, this would add some more triples, but so what? We are not talking about thousands of triples in the case microdata, ie, I do not think this is really huge practical issue. The user of the genearted RDF can safely ignore the triples that are unwanted by the application.

Note that item #4 is where the central registry would fail the most clearly. To take the schema.org example, they have a vocabulary set once and for all (http://schema.org/) but they will add new properties continuously. How would anyone make it sure that those property descriptions would end up in a central registry in time?

Another thought: we may think about folding into the md->RDF conversion the @vocab expansion mechanism of RDFa (maybe needless to say, but as an optional mechanism!). Some vocabularies, eg, schema.org, may set up such @vocab files anyway (we are already in discussion with DanBri on that), why not make use of those for this conversion, too?



On Nov 19, 2011, at 03:16 , Gregg Kellogg wrote:

> I've completed a number of updates to the Microdata to RDF spec [1]. The live editor's draft is at [2]. I believe this addresses all of the issues that we've discussed. It's a pretty substantial update.
> This version introduces the registry, in an ad-hoc JSON form, which allows vocabularies and particular properties to take on special processing attributes. This includes property URI generation and if values are placed in an rdf:List or not.
> Note that the registry is defined to live at http://www.w3.org/ns/md, and uses http://www.w3.org/ns/md# as a prefix. The document is not actually loaded here at this point. I'm also exploring an RDF representation of the registry, which you can see here [3][4]. Note that in this case I'm using rdfs:range semantics to determine serialization, and I've suggested some schema.org properties that may want to use an rdf:List range.
> This version retains the <time> element, although the content model has not been updated to include the latest WHATWG version (duration, gYear, etc. equivalents).
> My Ruby (public domain) implementation is updated, and uses an internal version of the registry. It's available for download on GitHub [5] and a live running version is on my distiller [6].
> Comments appreciated.
> Gregg Kellogg
> [1] https://dvcs.w3.org/hg/htmldata/raw-file/default/ED/microdata-rdf/20111118/index.html
> [2] https://dvcs.w3.org/hg/htmldata/raw-file/default/microdata-rdf/index.html
> [3] https://dvcs.w3.org/hg/htmldata/raw-file/default/microdata-namespace/ns.ttl
> [4] https://dvcs.w3.org/hg/htmldata/raw-file/default/microdata-namespace/ns.jsonld
> [5] http://github.com/gkellogg/rdf-microdata
> [6] http://rdf.greggkellogg.net/distiller?in_fmt=microdata

Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
FOAF: http://www.ivan-herman.net/foaf.rdf
Received on Monday, 21 November 2011 09:56:23 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 20:08:25 UTC