Re: Microdata to RDF: First Editor's Draft (ACTION-6) from Jeni Tennison on 2011-10-13 (public-html-data-tf@w3.org from October 2011)

From: Jeni Tennison <jeni@jenitennison.com>
Date: Thu, 13 Oct 2011 17:57:41 +0100
To: Gregg Kellogg <gregg@kellogg-assoc.com>
Cc: "public-html-data-tf@w3.org" <public-html-data-tf@w3.org>
Message-Id: <1F9AA5C3-F0FF-4A5F-ACB5-CA7934E10981@jenitennison.com>
Gregg,

On 12 Oct 2011, at 20:26, Gregg Kellogg wrote:
> I have created a draft of the Microdata to RDF transformation and uploaded it to our Mercurial repository [1]. Note that the links to the draft go to the repository, and the actual draft can be used by selecting the "raw" form of the document [2]. (The lack of a current checked-out version of the Mercurial repository that can be used for direct references should be addressed at some point).

Thanks, that's looking good.

> Notable changes between this draft and the algorithm given in [3]:
> 
> * If a page has more than one top-level item, they are expressed in an RDF Collection to preserve original item order.
> * If an item property has more than one value, all values are expressed in an RDF Collection to preserve original value order.

I think it might be worth, perhaps in the introduction, talking about how the goal for the mapping is to balance the preservation of information from the original microdata and the creation of idiomatic RDF. You might say something about how the results of the conversion may have to go through some level of vocabulary-specific mapping (which might include assigning datatypes to values, mapping collections to repeated properties and so on) after extraction. (Let me know if you want me to put words together for that).

The other thing about collections is that it looks as though you've based whether or not to create a collection for the values of a property purely on the values of the particular instance of the property. An alternative would be that if the property is used with multiple values *anywhere* in the page, it should create a collection (possibly with a single value) for consistency.

For example, if you have:

  <p itemscope itemtype="http://example.org/Book">
    A Book written by <span itemprop="author">A.N. Author</span>
  </p>
  <p itemscope itemtype="http://example.org/Book">
    Another Book written by <span itemprop="author">A.N. Author</span> and <span itemprop="author">A.N. Other</span>
  </p>

then I think you should get:

  @prefix eg: <http://example.org/>
  [] a eg:Book ;
    eg:author ("A.N. Author") ;
    .
  [] a eg:Book ;
    eg:author ("A.N. Author" "A.N. Other") ;
    .

rather than:

  @prefix eg: <http://example.org/>
  [] a eg:Book ;
    eg:author "A.N. Author" ;
    .
  [] a eg:Book ;
    eg:author ("A.N. Author" "A.N. Other") ;
    .

What do you think?

> * @itemprop names which are not absolute URIs are resolved as relative URIs either to @itemtype or Document base.
> * Resolving @itemprop names against @itemtype uses a modified algorithm using everything after "/" or "#" in the type URI.

Hixie rightly points out in [5]:


> Note that the property "name" in the vocabulary "http://example.org/feline"
> and the property "http://example.org/feline#name" have absolutely not 
> relationship in microdata. They are different properties and cannot be 
> mechanically considered to be equivalent in any way. Any use of microdata 
> that claims that a full URL property name is the same property as a short 
> name in a specific vocabulary is wrong. It's two properties. They might 
> have the same semantics and can be used as equivalent, but they are 
> different properties and any specification that defines or uses both would 
> need to define how to handle clashes.


There are two things that come out of that.

First is that the microdata-RDF mapping spec should flag up that the generation of property names used in the spec is a wilful violation [6] of the microdata specification to create URIs which are recognisable to the users of most existing vocabularies. (The only one I know that doesn't adhere to the pattern used in the microdata/RDF mapping is the hCalendar vocabulary in the WHATWG microdata spec.)

Second is that the spec needs to be clear about what happens when a short name for a property is turned into a URL that is also used in its full form in a property on that same item. My suggestion would be that the values are merged; the difficulty with that is in preserving the order of the values. Perhaps get the relevant property elements and sort them into document order before extracting their values?

> * The property value definition is updated as follows:
>   * Values are returned as Literal, URI Reference or Blank Node
>   * Time elements with a @datetime attribute uses a lexical matching against xsd:date, xsd:time, and xsd:dateTime to create appropriate typed literal
>   * Plain literals get language from elements' in-scope @lang
>   * blockquote and q with @cite attribute generate a URI Reference value

I don't think that the cite attributes should generate a URI reference value; that's not the item value according to the microdata rules, and I think it would be confusing for a <blockquote> or <q> to generate different values in the microdata parse from the generated RDF.

Cheers,

Jeni

> [1] https://dvcs.w3.org/hg/htmldata/
> [2] https://dvcs.w3.org/hg/htmldata/raw-file/24af1cde0da1/microdata-rdf/index.html
> [3] http://www.w3.org/TR/2011/WD-microdata-20110525/
> [4] http://dev.w3.org/html5/md/Overview.html

[5] http://lists.w3.org/Archives/Public/public-html-data-tf/2011Oct/0067.html
[6] http://www.w3.org/TR/html5/introduction.html#willful-violation

-- 
Jeni Tennison
http://www.jenitennison.com
Received on Thursday, 13 October 2011 16:58:05 UTC