W3C home > Mailing lists > Public > public-html@w3.org > September 2009

Re: XMLLiterals and c14n

From: Ivan Herman <ivan@w3.org>
Date: Mon, 07 Sep 2009 17:17:27 +0200
Message-ID: <4AA52407.2040408@w3.org>
To: Philip Taylor <pjt47@cam.ac.uk>
CC: Manu Sporny <msporny@digitalbazaar.com>, HTMLWG WG <public-html@w3.org>, RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>
Sigh. This is indeed a slightly muddy area where the RDF concept
document should be written differently. But, well, this is not something
either of these two working groups can do...

I think the issue is that the RDF concept spec describes the abstract
concepts for abstract RDF graphs, and not a serialization thereof. If
one looks at the production rules in the RDF/XML specification:

http://www.w3.org/TR/rdf-syntax-grammar/#section-grammar-productions

namely 7.2.17, the production rule says "l is transformed into the
lexical form of an XML literal in the RDF graph x". Ie, the RDF/XML
encoding allows for an XML that _can be transformed_ into a canonical
XML format (and that is all RDF/XML parsers behave as far as I know).

The situation for RDFa is the same. From an RDF point of view, RDFa is
another serialization of RDF, and its behaviour in this sense is
identical to RDF/XML. Ie, it is not required that the HTML content
should be in C14N format (that would be impractical), but the content
should be transformed (at least conceptually) when generating an
abstract RDF graph by whoever consumes RDFa. I believe the RDFa spec is
technically correct.

(On a practical level, all RDF environments and serializations I know
about behave similarly: they would take any (valid) XML as XML Literal,
and the C14N comes into the picture when two XML literals are checked,
eg, for equality.)

My 2 pence...

Ivan

Philip Taylor wrote:
> Manu Sporny wrote:
>> The most recent HTML+RDFa draft can be found here:
>>
>> http://html5.digitalbazaar.com/specs/rdfa.html
> 
> Section 4.2 now says:
> 
>> The markup above should produce the following triple: <>   
>> <http://example.org/vocab#markup>
>>       "<rect xmlns=\"http://www.w3.org/2000/svg\"
>> xmlns:ex=\"http://example.org/vocab#\" →
>> xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\"
>> width=\"300\" → height=\"100\"
>> style=\"fill:rgb(0,0,255);stroke-width:1; stroke:rgb(0,0,0)\"/>
>> → <rect xmlns=\"http://www.w3.org/2000/svg\"
>> xmlns:ex=\"http://example.org/vocab#\" →
>> xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\" width=\"50\"
>> → height=\"50\" style=\"fill:rgb(255,0,0);stroke-width:2; →
>> stroke:rgb(0,0,0)\"/>"^^http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral
>>
> 
> As far as I can tell, that violates the RDF specs.
> http://www.w3.org/TR/rdf-concepts/#section-XMLLiteral says "The lexical
> space is the set of all strings ... for which encoding as UTF-8 [RFC
> 2279] yields exclusive Canonical XML (with comments, with empty
> InclusiveNamespaces PrefixList) [XML-XC14N]". The XML in the spec is not
> in Exclusive Canonical Form (in particular I believe the xmlns:ex and
> xmlns:rdf must not be present), so it's not a legal XMLLiteral string.
> 
> This seems a slightly more widespread problem within RDFa, e.g.
> http://www.w3.org/2006/07/SWD/RDFa/testsuite/xhtml1-testcases/0011.sparql
> explicitly permits output which violates the definition of XMLLiteral -
> I think it should be updated to only permit output which is valid RDF
> (i.e. with XC14N-style XMLLiterals).
> http://www.w3.org/2006/07/SWD/RDFa/testsuite/xhtml1-testcases/0100.sparql
> has the same issue, and presumably other XMLLiteral tests do too.
> 
> http://www.w3.org/TR/rdfa-syntax/ says:
> 
>> The value of the [XML literal] is a string created by serializing to
>> text, all nodes that are descendants of the [current element], i.e., not
>> including the element itself, and giving it a datatype of
>> rdf:XMLLiteral.
> 
> which should be updated to state that the descendants must be serialized
> with the Exclusive XML Canonicalization algorithm. Similarly, the
> HTML+RDFa draft should refer to that algorithm instead of (or in
> addition to?) HTML5's #serializing-xhtml-fragments algorithm.
> 
> (Separately from the c14n issue,
> http://www.w3.org/TR/rdfa-syntax/#s_xml_literals says the expected
> output for one example is '<> dc:title "E = mc<sup>2</sup>: The Most
> Urgent Problem of Our Time"^^rdf:XMLLiteral', which is incorrect because
> it's lost the HTML namespace of the <sup> element.)
> 
> Another concern with c14n: My understanding is that Exclusive C14n only
> includes namespace declarations when the namespaces are "visibly
> utilized" by element or attribute names, and namespaces used only in
> CURIEs in attribute values are not visibly utilized, so their
> declarations will be removed. So Exclusive C14n of an RDFa document or
> fragment will almost certainly destroy its RDFa content. The only ways I
> can see to fix that are to change rdf-concepts to not require XC14N, or
> to change RDFa to not use XML Namespaces, though it might not be a
> problem that's worth fixing.
> 

-- 

Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
PGP Key: http://www.ivan-herman.net/pgpkey.html
FOAF: http://www.ivan-herman.net/foaf.rdf


Received on Monday, 7 September 2009 15:18:11 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 9 May 2012 00:16:48 GMT