Re: The harm that can come if the W3C supports publication of competing specs from Philip Jägenstedt on 2010-01-17 (public-html@w3.org from January 2010)

From: Philip Jägenstedt <philipj@opera.com>
Date: Sun, 17 Jan 2010 19:37:26 +0100
To: "Graham Klyne" <GK@ninebynine.org>
Cc: "Shelley Powers" <shelley.just@gmail.com>, "HTMLWG WG" <public-html@w3.org>
Message-ID: <op.u6owoobfsr6mfa@worf>

On Sun, 17 Jan 2010 14:55:43 +0100, Graham Klyne <GK@ninebynine.org> wrote:

> Philip Jägenstedt wrote:
>> On Sun, 17 Jan 2010 10:27:00 +0100, Graham Klyne <GK@ninebynine.org>  
>> wrote:

>> ... Since the RDF model is a graph, it is hard to see how it could be  
>> represented using HTMLCollection-like interfaces (you would need a  
>> query language for it to be useful) or mapped to JavaScript (you can  
>> construct the objects with some effort, but can't serialize them as  
>> JSON if the graph has loops).
>
> Hmmm... I'm not sure I fully follow this.  But turning it around,  
> RDFquery [1] is an example of an API that extracts RDF data from RDFa in  
> an HTML DOM.  I mention this as an existence proof, no more.
>
> [1] http://code.google.com/p/rdfquery/

I looked at rdfquery briefly when implementing the RDF extraction  
algorithm for microdatajs. [1] While I ended up not using it and may have  
misunderstood its purpose, it looks basically like a triplestore  
implemented in JavaScript which you can perform advanced (?) queries on.  
This is pretty neat of course, especially if you like graph theory, logic,  
etc (which I do). However, I don't think it is the best kind of interface  
to get at the data in the document when most of the time the author  
doesn't care about this fancy graph model and just wants to know what the  
colors of the cats are, for example. Also, implementing an efficient  
network database (for the triplestore) is actually quite difficult [2] and  
not something I expect browser vendors would want to spend time on unless  
the benefits are very substantial.

Note: I haven't seen any concrete proposals on a DOM API for a graph  
database (which the RDF model implies), perhaps there are solutions that I  
just don't know about. The above is just my guess on how things would have  
to work.

>>> The mediawiki thread cited by Shelley notes that there is some  
>>> ambiguity in the semantics of the microdata presentation, but that's  
>>> relatively easily fixed, I think (just ensure the unqualified  
>>> properties are mapped implicitly to a full URI, which in turn is  
>>> described by an RDF schema or OWL).
>>  If itemtype is not used, then the data has no semantics outside of the  
>> page, and using it is as unsafe as e.g. scraping HTML tables. The RDF  
>> extraction algorithm doesn't include untyped items, as it shouldn't. I  
>> wouldn't really call this ambiguity, but possibly the spec could be  
>> more explicit about this. In the extreme one could even require  
>> itemtype to be used, but I think that would harm useful site-private  
>> use of microdata.
>
> I would strongly prefer that local-only semantics were avoided, if  
> possible. Things have a way of leaking out from the place in which they  
> first appear. Maybe a "global" semantics could be attached to a URI that  
> is derived from the URI of the containing page, or the in-scope base  
> URI, if explicit typing is not provided.

I agree that this is problematic. However, I think assigning no semantics  
at all is preferable to using some kind of global namespace, as it should  
be very clear that untyped data should not be scraped or syndicated by  
third parties. Validators could also be aggressive about discouraging  
untyped data, but I don't think that's a good idea on the balance of  
things.

> I still think its preferable to have a single spec that meets all  
> requirements.

I agree. If there are requirements that are not met by microdata, now  
would be a good time to give feedback on the spec.

[1] http://foolip.org/microdatajs/demo/turtle.html
[2] http://neo4j.org/

-- 
Philip Jägenstedt

Received on Sunday, 17 January 2010 18:37:28 UTC