Re: Multiple itemtypes in microdata from Ian Hickson on 2011-10-20 (public-html-data-tf@w3.org from October 2011)

From: Ian Hickson <ian@hixie.ch>
Date: Thu, 20 Oct 2011 00:39:42 +0000 (UTC)
To: Gregg Kellogg <gregg@kellogg-assoc.com>
cc: Bradley Allen <bradley.p.allen@gmail.com>, Stéphane Corlosquet <scorlosquet@gmail.com>, "public-html-data-tf@w3.org" <public-html-data-tf@w3.org>
Message-ID: <Pine.LNX.4.64.1110200005030.21128@ps20323.dreamhostps.com>
On Tue, 18 Oct 2011, Gregg Kellogg wrote:
> >>
> >> The alternatives are:
> >> 
> >> 1) bake in support for each vocabulary into a conformant processor
> > 
> > This is the assumption that microdata is built around.
> 
> Doesn't scale, and requires a processor revision for each new 
> vocabulary.

It clearly does scale; HTML is built on this principle, for example, and 
that may be the world's most widely used vocabulary with literally 
trillions of documents that use it.


> >> 2) read a vocabulary document (i.e., RDFS or OWL) and determine 
> >> processing rules from rdfs:range/rdfs:domain specifications
> > 
> > Generally speaking, no language exists that is expressive enough to 
> > actually describe vocabularies in sufficient detail to make this 
> > practical for the kinds of vocabularies that microdata's use cases 
> > involve.
> 
> You say this, and yet a number of such vocabularies have, in fact, been 
> created and are in use today. I'm unclear on what is special about the 
> vocabularies described in HTML (vCard, vEvent, Licensing) that is so 
> complicated that FOAF, schema.org, and Creative Commons haven't been 
> able to get it right?

The schema.org vocabulary is defined in English.

But even with that, actually, the schema.org vocabulary is inadequately 
defined. It doesn't have what I would call a specification. For example, 
there's no conformance section defining the conformance classes. Or to 
pick a random property: there's no rules saying that "wordCount" can't be 
negative, and there's no conformance requirements on processors saying 
what they should do if "wordCount" _is_ negative.

The same applies to pretty much every RDF vocabulary I've ever seen. Where 
is the conformance class description for FOAF? Where does it define what 
to do if someone's age is described as negative? Where does it say how to 
parse a birthday value? What if the birthday value is "02-30", is that 
required to be ignored, treated as March 1st, March 2nd, cause the whole 
agent to be treated as errorneous and dropped?


> > I don't really understand what this means. What does RDF have to do 
> > with microdata in this context?
> 
> In the context of a Microdata to RDF transformation, I would think that 
> would be obvious.

Oh, I didn't realise that was what we were talking about. I thought the 
topic was the issue of a microdata item having multiple types.

I don't really have anything constructive to say about mapping microdata 
to RDF.


> > This has implications. For example, it would be invalid to treat these two 
> > microdata fragments as equivalent in any way:
> > 
> >   <address itemscope itemtype="http://microformats.org/profile/hcard">
> >    Written by
> >    <span itemprop="fn">
> >     <span itemprop="n" itemscope>
> >      <span itemprop="given-name">Jill</span>
> >      <span itemprop="family-name">Darpa</span>
> >     </span>
> >    </span>
> >   </address>
> > 
> >   <address itemscope itemtype="http://microformats.org/profile/hcard">
> >    Written by
> >    <span itemprop="http://microformats.org/profile/hcard#fn">
> >     <span itemprop="http://microformats.org/profile/hcard#n" itemscope>
> >      <span itemprop="http://microformats.org/profile/hcard#n/given-name">Jill</span>
> >      <span itemprop="http://microformats.org/profile/hcard#n/family-name">Darpa</span>
> >     </span>
> >    </span>
> >   </address>
> 
> [...]
>
> > Any software that handled the above in equivalent ways (e.g. finding a 
> > vCard with a name "Jill Darpa" in the second case) would be 
> > non-conforming implementations of the vCard microdata vocabulary.
> 
> It could just mean that the vCard HTML vocabulary isn't compatible with 
> the Microdata to RDF definition, in that case.

This isn't something specific to vCard. These two items:

   <p itemscope itemtype="data:,a#"><b itemprop=b>x</b></p>
   <p itemscope itemtype="data:,a#"><b itemprop="data:,a#b">x</b></p>

...state two different things, and treating them as equivalent would not 
be conforming within a microdata context.


> Although, I'm afraid I still don't understand the specific requirements 
> that make it so, given the ability to indicate domain and range with 
> RDFS.

Well, I guess in RDF with OWL you could define any two URLs as meaning the 
same thing. I mean, one could say:

   rdfs:subClassOf owl:sameAs rdfs:isDefinedBy .
   rdfs:domain owl:sameAs rdfs:range .
   foaf:Person owl:sameAs rdf:type .

...or indeed:

   owl:sameAs owl:sameAs owl:inverseOf .

So sure, once you've converted microdata items to RDF, I guess the issue 
is kind of moot, since you could treat a vCard as being equivalent to a 
vEvent, let alone treating an undefined URL as being equivalent to a 
particular defined vCard property.

The equivalence doesn't exist in the underlying microdata. Any tool that 
used RDF as an implementation detail to implement a feature on top of 
microdata would be non-conforming if it treated properties from one 
vocabulary as being equivalent to absolute URLs that that vocabulary does 
not explicitly define.


> > (This is why when there was a generic HTML to RDF conversion algorithm 
> > in the HTML spec, it went to some lengths to ensure that the URLs 
> > generated on the RDF side could not be present in conforming microdata 
> > -- it ensured that there was no way to end up in this confusing 
> > situation where two different conforming property names had the same 
> > semantic.)
> 
> Yet in a way that was broadly considered unsatisfying to the majority of 
> RDF consumers. That is why the HTML Data task force ended up taking this 
> on.

Just because it is unsatisfying doesn't mean the requirement isn't there. 
Aesthetics can't trump technical soundness.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Thursday, 20 October 2011 00:43:48 UTC