W3C home > Mailing lists > Public > public-html-data-tf@w3.org > October 2011

Re: Mapping Microdata to RDF

From: Jeni Tennison <jeni@jenitennison.com>
Date: Sat, 8 Oct 2011 22:19:37 +0100
Cc: "public-html-data-tf@w3.org" <public-html-data-tf@w3.org>
Message-Id: <10576B6C-439B-49E3-A55A-BF7B36BEF904@jenitennison.com>
To: Gregg Kellogg <gregg@kellogg-assoc.com>
Gregg, thank you very much for doing this: you've made a great start.

On 8 Oct 2011, at 08:22, Gregg Kellogg wrote:
> I created a straw-man proposal for mapping Microdata to RDF [1]. Note that my Wiki-fu is not great, and any help in improving formatting, particularly for definition lists containing other definition lists, would be helpful.

I've fixed a few such formatting things where I noticed them.

> Step 1.1 adds a base URI rule not in the original spec, but I believe it is correct none the less.

My feeling is that where possible the mapping should rely on the definitions within HTML5 rather than restating how these values are derived. This will help ensure that special cases aren't missed and that implementers can use existing code to work out these kinds of values. So in this case, I'd use the definition of the document base URI from HTML5 [4].

> Steps 1.2-1.5 are pretty much from the 2001-05-25 version of the Microdata spec [2], with some editorial comments indicating that I think we should consider eliminating them.

I agree that we should eliminate them from the microdata -> RDF mapping. I think there's a role for a general HTML -> RDF mapping, which would pick up on the semantics encoded within HTML documents generally (eg titles, citations, metadata in the head) but that a microdata -> RDF mapping should purely focus on resources identified as items through the itemscope attribute.

> Step 1.6-1.8 are a new interpretation of the original steps for transforming @itemscope based items to RDF with changes reflecting what I believe is the current thinking, including allowing for multiple @itemtype values, deriving the @itemprop token URIs from the first @itemtype value, and placing multiple values of a single property in an RDF Collection (list).

I don't think we need to worry about multiple itemtypes until Hixie resolves the existing bug on supporting multiple types [5].

As a general point, in the same way as for the document base URI as discussed above, I think we might do better to base the microdata -> RDF mapping on the microdata/HTML5 DOM API rather than on the HTML5 syntax. For example, refer to the item element's .properties rather than breaking apart the itemprop attribute. If the API doesn't provide enough information to create reasonable RDF, then we will need to raise bugs on it.

A few more specific things:

  * the time element can't have a duration value [6]
  * I'm not sure we should be ignoring properties that are neither absolute URIs nor on a typed item; perhaps we should be constructing URIs for them that look like {document base URI}#{property}?
  * it's not clear how the algorithm deals with properties whose values are URIs: do these become literal values or identify resources? (I think it should be the latter)
  * in step 3 of generating an RDF Collection, I think the object should be the blank node associated with the next element in the array rather than the next element in the array itself

> I have not attempted to describe additional semantics for undefined attributes, such as proposed @itemvocab or @itemvaltype, but these could be added fairly easily.

We can't add @item* properties except by suggesting them in bug reports on microdata; the mapping needs to deal with microdata as it's defined by HTML5.

> I think we should consider placing this in a ReSpec document rather than keeping it on the Wiki, as ReSpec is much better for formatting such procedures.

Please do use ReSpec if that's easier. I did want to make sure that it was in a shared space while we're noodling over how it should work, that's all.

> My own Ruby implementation [3] has been updated (on GitHub, not yet released) to comply with these rules and passes my own test suite, which I'd be happy to try to turn into a standard W3C test suite, similar to that defined by Turtle.

Having some examples would be really useful. Perhaps you can add links to them from the wiki page?

Thanks again,

Jeni

> [1] http://www.w3.org/wiki/Mapping_Microdata_to_RDF
> [2] http://www.w3.org/TR/2011/WD-microdata-20110525
> [3] https://github.com/gkellogg/rdf-microdata
> 
[4] http://dev.w3.org/html5/spec/Overview.html#document-base-url
[5] http://www.w3.org/Bugs/Public/show_bug.cgi?id=14233
[6] http://dev.w3.org/html5/spec/text-level-semantics.html#the-time-element
-- 
Jeni Tennison
http://www.jenitennison.com
Received on Saturday, 8 October 2011 21:20:04 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Saturday, 8 October 2011 21:20:04 GMT