Re: RDFa in HTML 5 from Shelley Powers on 2009-05-24 (public-html@w3.org from May 2009)

From: Shelley Powers <shelleyp@burningbird.net>
Date: Sun, 24 May 2009 07:42:28 -0500
To: Maciej Stachowiak <mjs@apple.com>
CC: Sam Ruby <rubys@intertwingly.net>, Manu Sporny <msporny@digitalbazaar.com>, Philip Taylor <pjt47@cam.ac.uk>, RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>, HTML WG <public-html@w3.org>
Message-ID: <4A1940B4.7020200@burningbird.net>

Maciej Stachowiak wrote:
>
> On May 23, 2009, at 7:01 AM, Shelley Powers wrote:
>
>> We can presume that programing logic is the same regardless of 
>> whether it is implemented in PHP, Python, or JavaScript. If so, then 
>> one can presume that JavaScript developers can read the same specs as 
>> developers in other programming languages. The English used is 
>> relatively simple. Not too many big words.
>
> What's different about JavaScript is that it's highly likely to use a 
> DOM already created by a browser or other html user agent. Meanwhile, 
> other languages would either parse directly, or use an off-the-shelf 
> parser. If in-page JavaScript extracts different triples from an 
> offline Python script, I hope we can agree that's a problem. There 
> could be various reasons for this to occur, here are two possible risks:
>
> 1) The RDFa-in-HTML spec may have processing requirements cast 
> directly in terms of markup, which give different results than what 
> would happen by first applying an HTML parser and then using the 
> resulting DOM. In some cases, this may be impossible to reconcile 
> without changing RDFa-in-HTML rules, for example if those rules rely 
> on information that is lost by the HTML parser in the course of 
> creating a DOM. Or it may simply be that the script would have to go 
> out of its way to replicate rules defined in terms of markup when 
> operating on the DOM, and the way to translate from one to the other 
> may not be entirely clear.
>
> 2) An offline processor written in Python may treat XHTML served as 
> text/html as XML, since there are so many off-the-shelf XML parsing 
> libraries and the script author may be unaware of the off-the-shelf 
> HTML5 parsers now available. If there is any difference between 
> text/html and application/xml processing rules for the same document, 
> this will almost certainly result in divergence in at least some 
> cases. Thus, we need to do at least one of ensuring identical 
> processing, or make it very clear that text/html must never be 
> processed as XML by an RDFa processor.
>
> I think an important role for an RDFa-in-HTML spec is to mitigate 
> these risks. For any form of embedded data in Web content, it's very 
> important that different tools all get the same answer, even in edge 
> cases.
>

I do understand about the DOM versus direct parsing issue. I've taken 
some of Philip's test cases, some of Shane's and have been playing with 
both using rdfquery, so am aware of the issues.

But the underlying understanding of what should be delivered should be 
clear, regardless of what programming environment you use. That's what 
should be documented in an RDFa document: the triples that should be 
derived from the RDFa.

Every programming language has its own challenges. If they don't use a 
DOM, they most likely use other libraries. Regardless, all will have 
quirks. But if you start recording information about the quirks, or the 
DOM, into the RDFa in HTML document, you tie it physically to a current 
state of both. Thus if the underlying DOM changes, the RDFa in HTML 
document will have to change, even though RDFa hasn't, itself, changed.

If anything, the RDFa test cases that Philip generated demonstrate that 
there's some issue with the underlying DOM in HTML, and the right place 
to address these are in documentation of the DOM. They are not specific 
to RDFa. And providing details about how to generate RDF triples, or 
creating new microdata sections is not the right way to address these 
problems. (Other than in tutorials or published notes, which should be 
encouraged.)

The RDFa folks will do what they want, I have no input into any future 
documentation. Perhaps they will decide its worth it to include the 
section you believe is important. I hope not, but that's their choice. I 
just hope that the HTML WG realizes that this doesn't mean the issues 
that Philip demonstrated, and that have been discussed in this thread 
should be ignored.

Shelley

Received on Sunday, 24 May 2009 12:43:27 UTC