W3C home > Mailing lists > Public > public-html@w3.org > September 2009

XMLLiterals and c14n (was: HTML+RDFa (2nd draft))

From: Philip Taylor <pjt47@cam.ac.uk>
Date: Mon, 07 Sep 2009 15:25:18 +0100
Message-ID: <4AA517CE.7080407@cam.ac.uk>
To: Manu Sporny <msporny@digitalbazaar.com>
CC: HTMLWG WG <public-html@w3.org>, RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>
Manu Sporny wrote:
> The most recent HTML+RDFa draft can be found here:
> 
> http://html5.digitalbazaar.com/specs/rdfa.html

Section 4.2 now says:

> The markup above should produce the following triple: 
> <> 
>    <http://example.org/vocab#markup>
>       "<rect xmlns=\"http://www.w3.org/2000/svg\" xmlns:ex=\"http://example.org/vocab#\" 
> → xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\" width=\"300\" 
> → height=\"100\" style=\"fill:rgb(0,0,255);stroke-width:1; stroke:rgb(0,0,0)\"/>
> → <rect xmlns=\"http://www.w3.org/2000/svg\" xmlns:ex=\"http://example.org/vocab#\" 
> → xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\" width=\"50\" 
> → height=\"50\" style=\"fill:rgb(255,0,0);stroke-width:2; 
> → stroke:rgb(0,0,0)\"/>"^^http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral

As far as I can tell, that violates the RDF specs. 
http://www.w3.org/TR/rdf-concepts/#section-XMLLiteral says "The lexical 
space is the set of all strings ... for which encoding as UTF-8 [RFC 
2279] yields exclusive Canonical XML (with comments, with empty 
InclusiveNamespaces PrefixList) [XML-XC14N]". The XML in the spec is not 
in Exclusive Canonical Form (in particular I believe the xmlns:ex and 
xmlns:rdf must not be present), so it's not a legal XMLLiteral string.

This seems a slightly more widespread problem within RDFa, e.g. 
http://www.w3.org/2006/07/SWD/RDFa/testsuite/xhtml1-testcases/0011.sparql 
explicitly permits output which violates the definition of XMLLiteral - 
I think it should be updated to only permit output which is valid RDF 
(i.e. with XC14N-style XMLLiterals). 
http://www.w3.org/2006/07/SWD/RDFa/testsuite/xhtml1-testcases/0100.sparql 
has the same issue, and presumably other XMLLiteral tests do too.

http://www.w3.org/TR/rdfa-syntax/ says:

> The value of the [XML literal] is a string created by serializing to
> text, all nodes that are descendants of the [current element], i.e., not
> including the element itself, and giving it a datatype of
> rdf:XMLLiteral.

which should be updated to state that the descendants must be serialized 
with the Exclusive XML Canonicalization algorithm. Similarly, the 
HTML+RDFa draft should refer to that algorithm instead of (or in 
addition to?) HTML5's #serializing-xhtml-fragments algorithm.

(Separately from the c14n issue, 
http://www.w3.org/TR/rdfa-syntax/#s_xml_literals says the expected 
output for one example is '<> dc:title "E = mc<sup>2</sup>: The Most 
Urgent Problem of Our Time"^^rdf:XMLLiteral', which is incorrect because 
it's lost the HTML namespace of the <sup> element.)

Another concern with c14n: My understanding is that Exclusive C14n only 
includes namespace declarations when the namespaces are "visibly 
utilized" by element or attribute names, and namespaces used only in 
CURIEs in attribute values are not visibly utilized, so their 
declarations will be removed. So Exclusive C14n of an RDFa document or 
fragment will almost certainly destroy its RDFa content. The only ways I 
can see to fix that are to change rdf-concepts to not require XC14N, or 
to change RDFa to not use XML Namespaces, though it might not be a 
problem that's worth fixing.

-- 
Philip Taylor
pjt47@cam.ac.uk
Received on Monday, 7 September 2009 14:40:03 UTC

This archive was generated by hypermail 2.3.1 : Monday, 29 September 2014 09:39:07 UTC