Re: Multiple itemtypes in microdata from Ian Hickson on 2011-12-08 (public-html-data-tf@w3.org from December 2011)

From: Ian Hickson <ian@hixie.ch>
Date: Thu, 8 Dec 2011 23:06:58 +0000 (UTC)
To: Gregg Kellogg <gregg@kellogg-assoc.com>, Ivan Herman <ivan@w3.org>
cc: Bradley Allen <bradley.p.allen@gmail.com>, Stéphane Corlosquet <scorlosquet@gmail.com>, "public-html-data-tf@w3.org" <public-html-data-tf@w3.org>
Message-ID: <Pine.LNX.4.64.1112082249590.9078@ps20323.dreamhostps.com>
On Wed, 19 Oct 2011, Gregg Kellogg wrote:
> On Oct 19, 2011, at 5:39 PM, Ian Hickson wrote:
> > On Tue, 18 Oct 2011, Gregg Kellogg wrote:
> >>>> 
> >>>> The alternatives are:
> >>>> 
> >>>> 1) bake in support for each vocabulary into a conformant processor
> >>> 
> >>> This is the assumption that microdata is built around.
> >> 
> >> Doesn't scale, and requires a processor revision for each new 
> >> vocabulary.
> > 
> > It clearly does scale; HTML is built on this principle, for example, 
> > and that may be the world's most widely used vocabulary with literally 
> > trillions of documents that use it.
> 
> Sorry, to be more specific, given potentially hundreds (or thousands) of 
> different vocabularies, trying to define rules for each of them doesn't 
> scale.

True, but how many vocabularies are we really going to see on the Web?

I mean, look at document formats. Statistically, we basically have four of 
them in common use on the Web -- mainly HTML, with some plain text, some 
Word documents, and some PDF files. Browsers support two or three out of 
four of those natively, depending on the browser; search engines typically 
support all four.

Why would microdata vocabularies be different? Why would people use 
generic microdata processors that don't know about the vocabulary they 
care about? That doesn't seem like a likely common scenario.


> >>>> 2) read a vocabulary document (i.e., RDFS or OWL) and determine 
> >>>> processing rules from rdfs:range/rdfs:domain specifications
> >>> 
> >>> Generally speaking, no language exists that is expressive enough to 
> >>> actually describe vocabularies in sufficient detail to make this 
> >>> practical for the kinds of vocabularies that microdata's use cases 
> >>> involve.
> >> 
> >> You say this, and yet a number of such vocabularies have, in fact, 
> >> been created and are in use today. I'm unclear on what is special 
> >> about the vocabularies described in HTML (vCard, vEvent, Licensing) 
> >> that is so complicated that FOAF, schema.org, and Creative Commons 
> >> haven't been able to get it right?
> > 
> > The schema.org vocabulary is defined in English.
> > 
> > But even with that, actually, the schema.org vocabulary is 
> > inadequately defined. It doesn't have what I would call a 
> > specification. For example, there's no conformance section defining 
> > the conformance classes. Or to pick a random property: there's no 
> > rules saying that "wordCount" can't be negative, and there's no 
> > conformance requirements on processors saying what they should do if 
> > "wordCount" _is_ negative.
> 
> It's certainly evolving. I was presuming the schema.rdfs.org version, 
> which is referenced from schema.org [1]. Even this could be defined more 
> tightly, as you suggest. The point is, that an OWL/RDFS based vocabulary 
> can specify these things unambiguously.

It is not at all clear to me that that is the case.

For example, how do you use OWL or RDFS to say that if the endDate for a 
FoodEvent is before its startDate, the endDate should instead be assumed 
to be the startDate?

Or that if a SaleEvent has as its subEvent a Festival that itself has as 
its subEvent the SaleEvent, the Festival should be assumed to be the 
parent event, and the SaleEvent's claim that it has a Festival subEvent 
should be ignored?

Or that in a Rating, if ratingValue is greater than bestRating it should 
be treated as bestRating, but that if it is lower than worstRating the 
whole Rating should be treated as bogus?

Or how to parse the value of openingHours of a LocalBusiness?


> > The same applies to pretty much every RDF vocabulary I've ever seen. 
> > Where is the conformance class description for FOAF? Where does it 
> > define what to do if someone's age is described as negative? Where 
> > does it say how to parse a birthday value? What if the birthday value 
> > is "02-30", is that required to be ignored, treated as March 1st, 
> > March 2nd, cause the whole agent to be treated as errorneous and 
> > dropped?
> 
> I suppose if vocabulary designers required this level of specificity to 
> ensure interoperability, it would be further defined.

HTML has long needed this level of specificity, but we didn't get around 
to trying to define it for 14 years.

Vocabularies for RDF, microdata, and microformats need this too. Just 
because nobody has done it doesn't mean it's not needed.


> > This isn't something specific to vCard. These two items:
> > 
> >   <p itemscope itemtype="data:,a#"><b itemprop=b>x</b></p>
> >   <p itemscope itemtype="data:,a#"><b itemprop="data:,a#b">x</b></p>
> > 
> > ...state two different things, and treating them as equivalent would 
> > not be conforming within a microdata context.
> 
> Well, this is really the crux of the matter. If the TF adopts a URI 
> generation scheme that is incompatible with the HTML Microdata spec, we 
> risk a formal objection disallowing such a specification from being 
> published as a recommendation.

More importantly, you risk corrupting data.


> Another option might be to share prefix definitions with RDFa's default 
> profile, so that authors can choose from a limited set of pre-defined 
> vocabularies as @itemprop or @itemtype values. This has been criticized 
> before as being too sophisticated for an HTML technology, but given that 
> there's no way to define prefixes within a document, and the list of 
> prefixes would be defined in a specification, this might, in fact, be 
> reasonable; it would also help improve compatibility with RDFa. This 
> could allow us to preserve the existing URI generation rules and allow 
> CURIE/PNAME use for RDF vocabularies within Microdata.
> 
> For example, assuming this could allow the following (assuming "schema:" 
> is defined):
> 
> <address itemscope itemtype="schema:Person">
>   Written by
>   <span itemprop="schema:name">
>     <span itemprop="schema:givenName">Jill</span>
>     <span itemprop="schema:familyName">Darpa</span>
>   </span>
> </address>
> 
> A processor that didn't understand the prefix definition would end up 
> treating these as absolute URIs with a "schema" URI scheme. Not ideal, 
> but recoverable by an application that could do the mapping from a JSON 
> representation after the fact.

I don't think making microdata uglier to enable prettier RDF predicate 
URLs is the right compromise. :-)

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Thursday, 8 December 2011 23:07:25 UTC