Re: 2001-09-07#5 Literals from Jeremy Carroll on 2001-09-25 (w3c-rdfcore-wg@w3.org from September 2001)

From: Jeremy Carroll <jjc@hplb.hpl.hp.com>
Date: Tue, 25 Sep 2001 12:52:55 +0100
To: <w3c-rdfcore-wg@w3.org>
Message-ID: <JAEBJCLMIFLKLOJGMELDAEDECCAA.jjc@hplb.hpl.hp.com>
I have been fixing a bug in ARP concerning rdf:parseType="Literal".

It was reported by Brian and was triggered by conflicts between the ARP
treatment and that of the Jena version of RDFFilter (Brian did the
rdf:parseType="Literal" code).

Looking in detail, neither parser conforms to the text that I posted
yesterday, despite the liberal intent of that text.

Also, I think that what Brian reported really was a defect, and we might
consider prohibiting it. (Qu: how liberal do we want to be?).

The defect was that ARP does not escape any text in element content in a
literal.
e.g.
<rdf:value rdf:parseType="Literal"><foo>&lt;</foo></rdf:value>
is returned as "<foo><</foo>"

I certainly intended when writing the text to permit that. (Although it is a
bad implementation).
However ARP does escape attribute value content so that:
<rdf:value rdf:parseType="Literal"><foo a="&lt;"/></rdf:value>
is returned as "<foo a='&lt;'></foo>"

Para 48 is intended to require that implementations are at least consistent.
And ARP is not, and so should be non-conformant.
[I am, of course, fixing ARP!]

===
[48]
   NOTE: The meaning of 'all' in the above paragraphs is that
   an RDF processing environment that makes such a change
   in one instance in one literal MUST make the corresponding
   change in every instance in every literal.
===

More, Brian's code does replace the character references more or less as
described in paras [43] and [44].

====
[43]
  - all attribute values can be normalized as in XML
    canonicalization viz, replacing :-
    . all ampersands (&) with &amp;
    . all open angle brackets (<) with &lt;
    . all quotation mark characters with &quot;
    . all whitespace characters #x9, #xA, and #xD, with character
      references.

[44]
  - all text nodes can be normalized as in XML
    canonicalization viz., replacing :-
    . all ampersands are replaced by &amp;
    . all open angle brackets (<) are replaced by &lt;
    . all closing angle brackets (>) are replaced by &gt;
    . all #xD characters are replaced by &#xD;.
====

However, he doesn't follow the XML Canonicalization specs, and really why
should he (in the spirit of RECOMMENDING canonicalization but MAYing any
coherent behaviour).

So, I am suggesting weaking [43] and [44] to allow more arbitrary charcter
reference replacements. The final sentence on each, links the two (XML
canonicalization has similar but not identical processing ...).

====
[43']
 - all expanded attribute values can be further processed by replacing any
character with an appropriate numeric characeter reference or an XML
predefined entity reference (i.e. &lt;, &gt;, &amp;, &apos; or &quot;). All
identical characters MUST be processed identically. If such processing
applies, similar processing MUST be applied to text nodes.

[44']
  - all expanded text nodes can be further processed by replacing any
character with an appropriate numeric characeter reference or an XML
predefined entity reference. All identical characters MUST be processed
identically. If such processing applies, similar processing MUST be applied
to attribute values.

====

Jeremy
Received on Tuesday, 25 September 2001 07:54:18 UTC