W3C home > Mailing lists > Public > w3c-rdfcore-wg@w3.org > June 2003

Re: Summary of strings, markup, and language tagging in RDF (resend)

From: Joseph Reagle <reagle@w3.org>
Date: Mon, 30 Jun 2003 16:14:12 -0400
To: Jeremy Carroll <jjc@hplb.hpl.hp.com>, Martin Duerst <duerst@w3.org>
Cc: Graham Klyne <gk@ninebynine.org>, Dan Connolly <connolly@w3.org>, w3c-i18n-ig@w3.org, "Ralph R. Swick" <swick@w3.org>, misha.wolf@reuters.com, Tim Berners-Lee <timbl@w3.org>, w3c-rdfcore-wg@w3.org
Message-Id: <200306301614.13031.reagle@w3.org>

On Monday 30 June 2003 15:46, Jeremy Carroll wrote:
> We lose xml:lang by using exc-c14n out of the box ... viz:
> [[
> attributes in the XML namespace, such as xml:lang and xml:space are not
> imported into orphan nodes of the document subset
> ]]

I'll note that while the issue of using exc-c14n arose out of my Last Call 
comments,  I haven't insisted on use of exc-c14n. I merely noted that the 
specs specified c14n in some places, and exc-c14n in others and asked they 
be consisted and recommended exc-c14 [1,2]. exc-c14n seems to be the 
preferred choice now-a-days since it permits subsets of XML to be easily be 
moved beyond contexts and it can be implemented a bit more easily. The ease 
comes from the fact that the ancestor nodes of the subset don't have to be 
crawled (or kept while parsed) to get ancestor xml attributes (e.g., 
xml:lang). (Ancestor namespace declarations are always in a descendent's 
axis, but xml attributes must be crawled.) However, I also resisted calls 
for c14n to be completely deprecated. If people care about context, it's a 
good choice for them.

> Because of this, in the LC docs we had a complicated and confusing
> work-around that involved putting the xml-literal inside an <rdf-wrapper>
> tag, whose sole purpose was to hold the xml:lang attribute. It is
> certainly less confusing to have ditched all of that.

Right, this is suggested in:

| http://www.w3.org/TR/2002/REC-xml-exc-c14n-20020718/#sec-Limitations
| The XML being canonicalized may depend on the effect of XML namespace
| attributes, such as xml:lang, xml:space, and xml:base appearing in
| ancestor nodes. To avoid problems due to the non-importation of such
| attributes into an enveloped document subset, either they must be
| explicitly given in the apex nodes of the XML document subset being
| canonicalized or they must always be declared with an equivalent value in
| every context in which the XML document subset will be interpreted.

XML is sufficiently complex beast that I've felt that there will never be a 
perfect "canonical" representation. I'd like if XML were simple enough that 
there could be, but until then, the serialization is really an applications 
decision, and our goals was to provide a sufficiently expressive and small 
set of serializations that met the majority of the applications 
requirements. In this case you can use:
1. c14n
2. exc-c14n and do a "apex node" restatement
3. exc-c14n and forget the xml attributes


[1] 
http://lists.w3.org/Archives/Public/www-rdf-comments/2003JanMar/0171.html
[2] 
http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2003JanMar/0030.html
Received on Monday, 30 June 2003 16:15:08 EDT

This archive was generated by hypermail pre-2.1.9 : Wednesday, 3 September 2003 09:57:59 EDT