Re: htmldata-ISSUE-1 (Microdata Vocabulary): Vocabulary specific parsing for Microdata from Gregg Kellogg on 2011-10-21 (public-html-data-tf@w3.org from October 2011)

From: Gregg Kellogg <gregg@kellogg-assoc.com>
Date: Fri, 21 Oct 2011 14:42:37 -0400
To: Edward O'Connor <eoconnor@apple.com>
CC: HTML Data Task Force WG <public-html-data-tf@w3.org>
Message-ID: <36AC321A-7540-4CDF-B586-B516DC4E5594@greggkellogg.net>

On Oct 21, 2011, at 11:13 AM, Edward O'Connor wrote:

Hi,

I think ISSUE-1 is based on a misunderstanding. The issue is titled
"Vocabulary specific parsing for Microdata," which implies that the
Microdata parser requires vocabulary knowledge to do its thing. This is
not the case. The Microdata parsing algorithm is vocabulary-agnostic:

http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#associating-names-with-items

The Microdata spec (now) describes a way to extract JSON from HTML Microdata, placing properties in the appropriate item object. For the purposes of a mapping to RDF, the properties need to be given URIs. The crux of the issue is on how to perform that mapping. From the May W3C working draft of HTML Microdata, this results in URIs that are incompatible with typical use in other vocabularies, such as schema.org<http://schema.org>. The issue relates to determining rules for determining the appropriate URI mapping.

Earlier in this thread, I noted the different ways in which URIs might be created:

<div itemscope itemtype="http://microformats.org/profile/hcard">
  <span itemprop="fn">Gregg Kellogg</span>
  <span itemprop="http://microformats.org/profile/fn">Ivan Herman</span>
  <span itemprop="foo" itemscope>
    <span itemprop="fn">Martin Hepp</fn>
  </span>
</div>

The Microdata to RDF algorithm would generate the following:

<> <http://www.w3.org/1999/xhtml/microdata#item> [ a <http://microformats.org/profile/hcard>;
     <http://microformats.org/profile/fn> ("Gregg Kellogg" "Ivan Herman");
     <http://microformats.org/profile/foo> [ <http://microformats.org/profile/fn> """Martin Hepp
  """]] .

Using previous algorithm:

<> <http://www.w3.org/1999/xhtml/microdata#item> [ a <http://microformats.org/profile/hcard>;
     <http://www.w3.org/1999/xhtml/microdata#http://microformats.org/profile/hcard%23:fn<http://www.w3.org/1999/xhtml/microdata#http://microformats.org/profile/hcard#:fn>> "Gregg Kellogg";
     <http://microformats.org/profile/fn> "Ivan Herman");
     <http://microformats.org/profile/foo> [ <http://www.w3.org/1999/xhtml/microdata#http://microformats.org/profile/hcard%23foo%20fn> """Martin Hepp
  """]] .

So, for @itemprop="fn", we might see the following URIs generated:
* http://microformats.org/profile/fn
* http://www.w3.org/1999/xhtml/microdata#http://microformats.org/profile/hcard%23:fn<http://www.w3.org/1999/xhtml/microdata#http://microformats.org/profile/hcard#:fn>
* http://www.w3.org/1999/xhtml/microdata#http://microformats.org/profile/hcard%23foo%20fn

The first is compatible with schema.org<http://schema.org>, and most other RDF vocabularies, but is incompatible with the interpretation from the HTML spec.

If you apply this to typical use with schema.org<http://schema.org> examples, unless absolute URIs are used for @itemprop values (which they're typically not in the wild), you would end up generating property URIs which are not described within the schema.org<http://schema.org> vocabulary. This is where the "vocabulary aware" bit comes in.


Ted

(Doing something useful with the data that comes out the other end does
require some knowledge of the vocabulary's domain, but that's of course
true of all data formats.)

Gregg

Received on Friday, 21 October 2011 18:43:33 UTC