Re: Updated Microdata to RDF spec from Jeni Tennison on 2011-11-24 (public-html-data-tf@w3.org from November 2011)

From: Jeni Tennison <jeni@jenitennison.com>
Date: Thu, 24 Nov 2011 18:13:37 +0000
To: Ivan Herman <ivan@w3.org>
Cc: Gregg Kellogg <gregg@kellogg-assoc.com>, HTML Data Task Force WG <public-html-data-tf@w3.org>
Message-Id: <CCF94FFC-66EE-470D-A23D-A4C929ABC002@jenitennison.com>

Ivan,

On 22 Nov 2011, at 08:22, Ivan Herman wrote:
> I personally would like to aim at a document that implementers can pick up today. Ie, if we want to preserve the notion of registries, a possibility would be to put it into an appendix, making it clear that the goal of that is to preserve discussions.

I think we would all prefer not to have a registry. But I cannot see a way of resolving the requirements on this mapping such that we can do without one.

Our main constraint is that there's not much point having a mapping that doesn't do something sensible with schema.org microdata, but schema.org has three features that make that difficult:

1. schema.org's extension mechanism means you can't just assume that you can take the last path segment off a type URL to get a vocabulary URL
2. schema.org has some properties (itemListElement, for example) where the ordering of values really needs to be preserved, but a bunch of others where it really doesn't
3. schema.org is not a static, standardised vocabulary; it is possible, nay likely, that more properties that require ordering will be added in the future

The only proposal that I have seen about how to manage how to identify an appropriate vocabulary URI without a registry was Toby's heuristic algorithm [1]. Both Dan and Lin pushed back against that approach because it relies on various naming (and particularly case) conventions in type URIs. Another option, as Gregg said, is to use something less dependent on naming conventions and provide guidance for publishers to use schema.org in particular ways so that RDF processors can get reasonable data -- advice I doubt any publisher will have any motivation to follow.

The proposals that I've seen about how to work out when to create lists and when to create repeated properties have basically been:

  1. ignore it -- this means people cannot reconstitute the original ordering of the values, which doesn't cover all of schema.org
  2. provide an ordered (List value) property as well as all the values in individual triples -- this is misleading, breaks property ranges, and makes the data hard to process, even if the ordered and unordered variants are in separate graphs

Or we could hard-code support for schema.org into the spec, but (a) I don't think any of us feel that's appropriate for a W3C Rec-track document and (b) schema.org is likely to change so it would soon get out of date.

At a process level, in my opinion Gregg's document is more helpful with the registry concept than without, both due to clarity and because it will be far easier for a future WG to take out the concept of the registry than for them to introduce it to a document that lacks it.

I also think that it is implementable now, and I believe Gregg has implemented it himself. At this stage, I think it would be useful for other implementers to give feedback on whether it works in their environment. I will elicit that.

>> <div itemscope itemtype="http://schema.org/Person http://schema.org/Person/Teacher">
>> <p itemprop="name">Ivan Herman</p>
>> </div>
>> 
> 
> Is such multiple type allowed in microdata? Not at the moment, I presume.

Yes, that is allowed now, so long as the two types are "in the same vocabulary". Schema.org defines its vocabulary as anything under http://schema.org/.

Cheers,

Jeni

[1] http://lists.w3.org/Archives/Public/public-html-data-tf/2011Oct/0246.html
-- 
Jeni Tennison
http://www.jenitennison.com

Received on Thursday, 24 November 2011 18:14:15 UTC