Re: RDFa in HTML 5 from Maciej Stachowiak on 2009-05-24 (public-html@w3.org from May 2009)

From: Maciej Stachowiak <mjs@apple.com>
Date: Sat, 23 May 2009 23:57:16 -0700
To: Shelley Powers <shelleyp@burningbird.net>
Cc: Sam Ruby <rubys@intertwingly.net>, Manu Sporny <msporny@digitalbazaar.com>, Philip Taylor <pjt47@cam.ac.uk>, RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>, HTML WG <public-html@w3.org>
Message-id: <B50E8B2A-448B-4AEE-9901-6B5615C7488E@apple.com>

On May 23, 2009, at 7:01 AM, Shelley Powers wrote:

> We can presume that programing logic is the same regardless of  
> whether it is implemented in PHP, Python, or JavaScript. If so, then  
> one can presume that JavaScript developers can read the same specs  
> as developers in other programming languages. The English used is  
> relatively simple. Not too many big words.

What's different about JavaScript is that it's highly likely to use a  
DOM already created by a browser or other html user agent. Meanwhile,  
other languages would either parse directly, or use an off-the-shelf  
parser. If in-page JavaScript extracts different triples from an  
offline Python script, I hope we can agree that's a problem. There  
could be various reasons for this to occur, here are two possible risks:

1) The RDFa-in-HTML spec may have processing requirements cast  
directly in terms of markup, which give different results than what  
would happen by first applying an HTML parser and then using the  
resulting DOM. In some cases, this may be impossible to reconcile  
without changing RDFa-in-HTML rules, for example if those rules rely  
on information that is lost by the HTML parser in the course of  
creating a DOM. Or it may simply be that the script would have to go  
out of its way to replicate rules defined in terms of markup when  
operating on the DOM, and the way to translate from one to the other  
may not be entirely clear.

2) An offline processor written in Python may treat XHTML served as  
text/html as XML, since there are so many off-the-shelf XML parsing  
libraries and the script author may be unaware of the off-the-shelf  
HTML5 parsers now available. If there is any difference between text/ 
html and application/xml processing rules for the same document, this  
will almost certainly result in divergence in at least some cases.  
Thus, we need to do at least one of ensuring identical processing, or  
make it very clear that text/html must never be processed as XML by an  
RDFa processor.

I think an important role for an RDFa-in-HTML spec is to mitigate  
these risks. For any form of embedded data in Web content, it's very  
important that different tools all get the same answer, even in edge  
cases.

Regards,
Maciej

Received on Sunday, 24 May 2009 06:58:10 UTC