Re: The harm that can come if the W3C supports publication of competing specs from Philip Jägenstedt on 2010-01-17 (public-html@w3.org from January 2010)

From: Philip Jägenstedt <philipj@opera.com>
Date: Sun, 17 Jan 2010 13:25:41 +0100
To: "Graham Klyne" <GK@ninebynine.org>, "Shelley Powers" <shelley.just@gmail.com>
Cc: "HTMLWG WG" <public-html@w3.org>
Message-ID: <op.u6ofg3sgsr6mfa@worf>

On Sun, 17 Jan 2010 10:27:00 +0100, Graham Klyne <GK@ninebynine.org> wrote:

> Having skimmed the messages in this thread to date, I feel the focus on  
> *browser* implementations is ignoring wider concerns of non-browser  
> applications that consume the embedded data, and that the focus on  
> syntax is distracting from what really matters, the underlying semantic  
> model.

I agree that browsers will be minority consumers of microdata, but also  
feel that by having a model that is possible to write good DOM APIs for  
which are possible (desirable rather) to implement in browsers, the  
metadata becomes much more useful (likely to be used) by others than the  
traditional semantic web community. Since the RDF model is a graph, it is  
hard to see how it could be represented using HTMLCollection-like  
interfaces (you would need a query language for it to be useful) or mapped  
to JavaScript (you can construct the objects with some effort, but can't  
serialize them as JSON if the graph has loops).

> The mediawiki thread cited by Shelley notes that there is some ambiguity  
> in the semantics of the microdata presentation, but that's relatively  
> easily fixed, I think (just ensure the unqualified properties are mapped  
> implicitly to a full URI, which in turn is described by an RDF schema or  
> OWL).

If itemtype is not used, then the data has no semantics outside of the  
page, and using it is as unsafe as e.g. scraping HTML tables. The RDF  
extraction algorithm doesn't include untyped items, as it shouldn't. I  
wouldn't really call this ambiguity, but possibly the spec could be more  
explicit about this. In the extreme one could even require itemtype to be  
used, but I think that would harm useful site-private use of microdata.

> So while I agree that it is very unfortunate that there are two  
> competing proposals, I don't think it's fatal *provided* that there is a  
> clear mapping to underlying common semantics.  In this case, RDF is  
> firmly established in W3C as a semantic model for metadata, so I argue  
> that all proposals should provide a clear mapping from their particular  
> syntax to the RDF abstract syntax.  The existing semantics of RDF can  
> take care of the rest.

I would agree, RDF is a well established model and there's nothing much  
wrong with it. Presently, the only RDF concept I'm aware of that can't be  
expressed using microdata is XML Schema Datatypes (XSD). I would argue  
that the datatypes should defined in the vocabulary and not by the author,  
so I consider this restriction quite sensible. This only seems to matter  
if you're trying to embed RDF data verbatim which you have no control  
over, in which case I would argue that you shouldn't bother with either  
microdata or RDFa and simply link to an external N3/Turtle representation.  
However, if a use case other than "express arbitrary RDF" requires XSD, it  
certainly wouldn't be too late to add it to microdata as itemproptype or  
something. (I would be interested in hearing about such use cases.)

-- 
Philip Jägenstedt

Received on Sunday, 17 January 2010 12:25:51 UTC