Re: Multiple itemtypes in microdata

On Tue, 18 Oct 2011, Gregg Kellogg wrote:
> 
> Hixie, note that I raised property URI generation as ISSUE-1 [1] (along 
> with other transformation issues). From reading the HTML/Microdata spec, 
> it would seem that processors really need to have vocabulary-specific 
> rules for interpreting these rules. This is important for property URI 
> generation, but also for maintaining value order and specifying 
> per-property literal datatypes.

Yes, all the use cases for microdata were things where it made no sense 
for software to do anything with the data unless it knew what the data 
meant, so the assumption is that the microdata processing software knows 
the vocabulary.

This is similar to how XML processors are expected to have 
namespace-specific knowledge to be useful. Sure, you can have generic XML 
or microdata (or JSON or...) parsers, but to do anything useful with the 
data, you have to stick those parsers onto a frontend that knows about the 
data itself.


> The alternatives are:
> 
> 1) bake in support for each vocabulary into a conformant processor

This is the assumption that microdata is built around.


> 2) read a vocabulary document (i.e., RDFS or OWL) and determine 
> processing rules from rdfs:range/rdfs:domain specifications

Generally speaking, no language exists that is expressive enough to 
actually describe vocabularies in sufficient detail to make this practical 
for the kinds of vocabularies that microdata's use cases involve.


> 3) do nothing, use a single processing algorithm that is generic across 
> all vocabularies and leave it to post-processing to perform 
> vocabulary-specific modifications. (Although this does not really 
> address property URI generation variation between vocabularies defined 
> in HTML and other RDF vocabularies).

I don't really understand what this means. What does RDF have to do with 
microdata in this context?


> Note, that if the HTML spec specified 
> http://microformats.org/profile/hcard# as the vCard type, instead of 
> just http://microformats.org/profile/hcard, properties would be 
> generated relative to the type using processing rules currently 
> described in [2], which is intended to be compatible with 
> schema.org<http://schema.org> and other RDF vocabularies.

The properties in the microdata vCard vocabulary aren't URLs, and it would 
be incorrect to treat them as URLs. They are "defined property names" in 
the sense defined in the HTML specification.

This has implications. For example, it would be invalid to treat these two 
microdata fragments as equivalent in any way:

   <address itemscope itemtype="http://microformats.org/profile/hcard">
    Written by
    <span itemprop="fn">
     <span itemprop="n" itemscope>
      <span itemprop="given-name">Jill</span>
      <span itemprop="family-name">Darpa</span>
     </span>
    </span>
   </address>

   <address itemscope itemtype="http://microformats.org/profile/hcard">
    Written by
    <span itemprop="http://microformats.org/profile/hcard#fn">
     <span itemprop="http://microformats.org/profile/hcard#n" itemscope>
      <span itemprop="http://microformats.org/profile/hcard#n/given-name">Jill</span>
      <span itemprop="http://microformats.org/profile/hcard#n/family-name">Darpa</span>
     </span>
    </span>
   </address>

Any software that handled the above in equivalent ways (e.g. finding a 
vCard with a name "Jill Darpa" in the second case) would be non-conforming 
implementations of the vCard microdata vocabulary.

(This is why when there was a generic HTML to RDF conversion algorithm in 
the HTML spec, it went to some lengths to ensure that the URLs generated 
on the RDF side could not be present in conforming microdata -- it ensured 
that there was no way to end up in this confusing situation where two 
different conforming property names had the same semantic.)

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Wednesday, 19 October 2011 00:04:43 UTC