W3C home > Mailing lists > Public > public-html-data-tf@w3.org > October 2011

Re: Mapping Microdata to RDF

From: Gregg Kellogg <gregg@kellogg-assoc.com>
Date: Mon, 10 Oct 2011 01:42:12 -0400
To: Jeni Tennison <jeni@jenitennison.com>
CC: "public-html-data-tf@w3.org" <public-html-data-tf@w3.org>
Message-ID: <F2E4E89C-0FFF-4190-BD88-24030F28881D@greggkellogg.net>
On Oct 8, 2011, at 2:19 PM, Jeni Tennison wrote:

Gregg, thank you very much for doing this: you've made a great start.

On 8 Oct 2011, at 08:22, Gregg Kellogg wrote:
I created a straw-man proposal for mapping Microdata to RDF [1]. Note that my Wiki-fu is not great, and any help in improving formatting, particularly for definition lists containing other definition lists, would be helpful.

I've fixed a few such formatting things where I noticed them.

I'm creating the ReSpec version now, thanks.

...
Step 1.6-1.8 are a new interpretation of the original steps for transforming @itemscope based items to RDF with changes reflecting what I believe is the current thinking, including allowing for multiple @itemtype values, deriving the @itemprop token URIs from the first @itemtype value, and placing multiple values of a single property in an RDF Collection (list).

I don't think we need to worry about multiple itemtypes until Hixie resolves the existing bug on supporting multiple types [5].

As a general point, in the same way as for the document base URI as discussed above, I think we might do better to base the microdata -> RDF mapping on the microdata/HTML5 DOM API rather than on the HTML5 syntax. For example, refer to the item element's .properties rather than breaking apart the itemprop attribute. If the API doesn't provide enough information to create reasonable RDF, then we will need to raise bugs on it.

Interesting that Hixie didn't do this himself. I'll tackle that on another pass.

A few more specific things:

 * the time element can't have a duration value [6]

Interesting, xsd:duration is used in schema.org<http://schema.org> examples, and there's some discussion in the WHATWG wiki [7], but it was never pushed forward. I wonder if we should file a bug, it's necessary for Recipe use cases, and there's no other way to get a typed literal into the spec.

 * I'm not sure we should be ignoring properties that are neither absolute URIs nor on a typed item; perhaps we should be constructing URIs for them that look like {document base URI}#{property}?

If there is an itemtype, a property value should either by an absolute URI, or something that is appended to the type namespace. The issues about lexical form of that are left to the base Microdata spec.

 * it's not clear how the algorithm deals with properties whose values are URIs: do these become literal values or identify resources? (I think it should be the latter)

I'll make it clear that unless specifically identified as a typed literal, all values that are defined as absolute URIs are treated as URI references (if that's still the appropriate nomenclature). This should be clear from the property value section.

 * in step 3 of generating an RDF Collection, I think the object should be the blank node associated with the next element in the array rather than the next element in the array itself

Consider the following markup:

<div itemscope>
  <span itemprop="http://purl.org/dc/terms/title">foo</span>
  <span itemprop="http://purl.org/dc/terms/title">bar</span>
</div>

I believe that this should produce the following:

[ dc:title ("foo" "bar") ] .

If I understand you correctly, it would produce the following:

[ dc:title "foo", ("bar") ] .

This would have no way to order "foo" relative to ("bar"), and so I believe would not be correct (IMHO).

...

Having some examples would be really useful. Perhaps you can add links to them from the wiki page?

I'll add some examples to the ReSpec document, but we should also have a space for them on the wiki. We should probably turn [1] into a reference to the ReSpec document, discussion and examples.

Thanks again,

Jeni

[1] http://www.w3.org/wiki/Mapping_Microdata_to_RDF
[2] http://www.w3.org/TR/2011/WD-microdata-20110525
[3] https://github.com/gkellogg/rdf-microdata

[4] http://dev.w3.org/html5/spec/Overview.html#document-base-url
[5] http://www.w3.org/Bugs/Public/show_bug.cgi?id=14233
[6] http://dev.w3.org/html5/spec/text-level-semantics.html#the-time-element
--
Jeni Tennison
http://www.jenitennison.com

Gregg

[7] http://wiki.whatwg.org/wiki/Time_element#duration
Received on Monday, 10 October 2011 05:43:12 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 10 October 2011 05:43:13 GMT