- From: Jeremy Carroll <jjc@hplb.hpl.hp.com>
- Date: Tue, 25 Sep 2001 12:52:55 +0100
- To: <w3c-rdfcore-wg@w3.org>
I have been fixing a bug in ARP concerning rdf:parseType="Literal". It was reported by Brian and was triggered by conflicts between the ARP treatment and that of the Jena version of RDFFilter (Brian did the rdf:parseType="Literal" code). Looking in detail, neither parser conforms to the text that I posted yesterday, despite the liberal intent of that text. Also, I think that what Brian reported really was a defect, and we might consider prohibiting it. (Qu: how liberal do we want to be?). The defect was that ARP does not escape any text in element content in a literal. e.g. <rdf:value rdf:parseType="Literal"><foo><</foo></rdf:value> is returned as "<foo><</foo>" I certainly intended when writing the text to permit that. (Although it is a bad implementation). However ARP does escape attribute value content so that: <rdf:value rdf:parseType="Literal"><foo a="<"/></rdf:value> is returned as "<foo a='<'></foo>" Para 48 is intended to require that implementations are at least consistent. And ARP is not, and so should be non-conformant. [I am, of course, fixing ARP!] === [48] NOTE: The meaning of 'all' in the above paragraphs is that an RDF processing environment that makes such a change in one instance in one literal MUST make the corresponding change in every instance in every literal. === More, Brian's code does replace the character references more or less as described in paras [43] and [44]. ==== [43] - all attribute values can be normalized as in XML canonicalization viz, replacing :- . all ampersands (&) with & . all open angle brackets (<) with < . all quotation mark characters with " . all whitespace characters #x9, #xA, and #xD, with character references. [44] - all text nodes can be normalized as in XML canonicalization viz., replacing :- . all ampersands are replaced by & . all open angle brackets (<) are replaced by < . all closing angle brackets (>) are replaced by > . all #xD characters are replaced by 
. ==== However, he doesn't follow the XML Canonicalization specs, and really why should he (in the spirit of RECOMMENDING canonicalization but MAYing any coherent behaviour). So, I am suggesting weaking [43] and [44] to allow more arbitrary charcter reference replacements. The final sentence on each, links the two (XML canonicalization has similar but not identical processing ...). ==== [43'] - all expanded attribute values can be further processed by replacing any character with an appropriate numeric characeter reference or an XML predefined entity reference (i.e. <, >, &, ' or "). All identical characters MUST be processed identically. If such processing applies, similar processing MUST be applied to text nodes. [44'] - all expanded text nodes can be further processed by replacing any character with an appropriate numeric characeter reference or an XML predefined entity reference. All identical characters MUST be processed identically. If such processing applies, similar processing MUST be applied to attribute values. ==== Jeremy
Received on Tuesday, 25 September 2001 07:54:18 UTC