Re: Multiple itemtypes in microdata from Gregg Kellogg on 2011-10-20 (public-html-data-tf@w3.org from October 2011)

From: Gregg Kellogg <gregg@kellogg-assoc.com>
Date: Wed, 19 Oct 2011 22:03:13 -0400
To: Ian Hickson <ian@hixie.ch>
CC: Gregg Kellogg <gregg@kellogg-assoc.com>, Bradley Allen <bradley.p.allen@gmail.com>, Stéphane Corlosquet <scorlosquet@gmail.com>, "public-html-data-tf@w3.org" <public-html-data-tf@w3.org>
Message-ID: <5448EA3B-DFE2-45F1-8E85-701BC7DDC901@greggkellogg.net>
On Oct 19, 2011, at 5:39 PM, Ian Hickson wrote:

> On Tue, 18 Oct 2011, Gregg Kellogg wrote:
>>>> 
>>>> The alternatives are:
>>>> 
>>>> 1) bake in support for each vocabulary into a conformant processor
>>> 
>>> This is the assumption that microdata is built around.
>> 
>> Doesn't scale, and requires a processor revision for each new 
>> vocabulary.
> 
> It clearly does scale; HTML is built on this principle, for example, and 
> that may be the world's most widely used vocabulary with literally 
> trillions of documents that use it.

Sorry, to be more specific, given potentially hundreds (or thousands) of different vocabularies, trying to define rules for each of them doesn't scale. Certainly, the use of any one vocabulary, such as vCard, can scale quite well. Whether or not there is value in supporting an unknown and open set of vocabularies is a different matter. My belief is that, in the context of using Microdata as a markup for RDF, that we should allow Microdata to be used with an open set of vocabularies as any other RDF serialization, with predicate URI generation rules using a flat namespace. 

... Or not, if it is the sense of the task force to recommend Microdata for a narrow set of vocabularies with pre-defined semantics and the original URI generation algorithm.

>>>> 2) read a vocabulary document (i.e., RDFS or OWL) and determine 
>>>> processing rules from rdfs:range/rdfs:domain specifications
>>> 
>>> Generally speaking, no language exists that is expressive enough to 
>>> actually describe vocabularies in sufficient detail to make this 
>>> practical for the kinds of vocabularies that microdata's use cases 
>>> involve.
>> 
>> You say this, and yet a number of such vocabularies have, in fact, been 
>> created and are in use today. I'm unclear on what is special about the 
>> vocabularies described in HTML (vCard, vEvent, Licensing) that is so 
>> complicated that FOAF, schema.org, and Creative Commons haven't been 
>> able to get it right?
> 
> The schema.org vocabulary is defined in English.
> 
> But even with that, actually, the schema.org vocabulary is inadequately 
> defined. It doesn't have what I would call a specification. For example, 
> there's no conformance section defining the conformance classes. Or to 
> pick a random property: there's no rules saying that "wordCount" can't be 
> negative, and there's no conformance requirements on processors saying 
> what they should do if "wordCount" _is_ negative.

It's certainly evolving. I was presuming the schema.rdfs.org version, which is referenced from schema.org [1]. Even this could be defined more tightly, as you suggest. The point is, that an OWL/RDFS based vocabulary can specify these things unambiguously.

> The same applies to pretty much every RDF vocabulary I've ever seen. Where 
> is the conformance class description for FOAF? Where does it define what 
> to do if someone's age is described as negative? Where does it say how to 
> parse a birthday value? What if the birthday value is "02-30", is that 
> required to be ignored, treated as March 1st, March 2nd, cause the whole 
> agent to be treated as errorneous and dropped?

I suppose if vocabulary designers required this level of specificity to ensure interoperability, it would be further defined. The tools exist to make it unambiguous; vocabulary definitions, including FOAF, are still evolving. In any case, the specifics and requirements of vocabulary definition are really beyond the scope of this task force (except as the may involve processor rules, which is one reason why I'd rather stay away from this option).

>> [...]
>> 
>>> Any software that handled the above in equivalent ways (e.g. finding a 
>>> vCard with a name "Jill Darpa" in the second case) would be 
>>> non-conforming implementations of the vCard microdata vocabulary.
>> 
>> It could just mean that the vCard HTML vocabulary isn't compatible with 
>> the Microdata to RDF definition, in that case.
> 
> This isn't something specific to vCard. These two items:
> 
>   <p itemscope itemtype="data:,a#"><b itemprop=b>x</b></p>
>   <p itemscope itemtype="data:,a#"><b itemprop="data:,a#b">x</b></p>
> 
> ...state two different things, and treating them as equivalent would not 
> be conforming within a microdata context.

Well, this is really the crux of the matter. If the TF adopts a URI generation scheme that is incompatible with the HTML Microdata spec, we risk a formal objection disallowing such a specification from being published as a recommendation. If we basically duplicate the original predicate URI generation algorithm, we ignore the needs of RDF vocabularies, and recommend that all property names be specified as full URIs, or that Microdata is not recommended as a means of representing data in these vocabularies. By ensuring that RDFa and Microdata can coexist in the same document (albeit with duplicate properties), this may be the best recommendation.

Another option might be to share prefix definitions with RDFa's default profile, so that authors can choose from a limited set of pre-defined vocabularies as @itemprop or @itemtype values. This has been criticized before as being too sophisticated for an HTML technology, but given that there's no way to define prefixes within a document, and the list of prefixes would be defined in a specification, this might, in fact, be reasonable; it would also help improve compatibility with RDFa. This could allow us to preserve the existing URI generation rules and allow CURIE/PNAME use for RDF vocabularies within Microdata.

For example, assuming this could allow the following (assuming "schema:" is defined):

<address itemscope itemtype="schema:Person">
  Written by
  <span itemprop="schema:name">
    <span itemprop="schema:givenName">Jill</span>
    <span itemprop="schema:familyName">Darpa</span>
  </span>
</address>

A processor that didn't understand the prefix definition would end up treating these as absolute URIs with a "schema" URI scheme. Not ideal, but recoverable by an application that could do the mapping from a JSON representation after the fact.

...

Gregg

[1] http://schema.org/docs/documents.html

> -- 
> Ian Hickson               U+1047E                )\._.,--....,'``.    fL
> http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
> Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Thursday, 20 October 2011 02:04:23 UTC