charmod-uri draft message from Jeremy Carroll on 2002-05-02 (w3c-rdfcore-wg@w3.org from May 2002)

From: Jeremy Carroll <jjc@hplb.hpl.hp.com>
Date: Thu, 2 May 2002 17:07:54 +0100
To: <w3c-rdfcore-wg@w3.org>
Message-ID: <JAEBJCLMIFLKLOJGMELDOEOMCDAA.jjc@hplb.hpl.hp.com>
Draft:
 - for www-rdf-interest
       www-webont-wg
       uri
       w3c-i18n-ig
       xml-names-editor@w3.org

The RDF Core WG has decided [1] that the URI ref labels used in an RDF graph are in Unicode rather than US-ASCII.
These are seen as "original character sequences" in the terminology of RFC 2396 (see [10]). 

This is motivated by the clarity in XML Erratum 26 [2] (and equally in XLink section 5.4) that the %-escaping algorithm that is used for mapping international characters into US-ASCII is "performed only when absolutely necessary and as late as possible in a processing chain". 

In the terms of the text from xlink [3] the labels on the RDF graph  "must be a URI reference as defined in [IETF RFC 2396], or must result in a URI reference after the escaping procedure [...] is applied."

Moreover, the WG decided, guided by charmod [4] that such labels should conform with Unicode Normal Form C.

For those unfamiliar with the RDF graph but familiar with XML Namespaces, the analogous decision would be that the identifier of an XML Namespaces "must be a URI reference as defined in [IETF RFC 2396], or must result in a URI reference after the escaping procedure [...] is applied." and that the escaping procedure is *not* applied before considering whether two such identiers are identical.

The decision was with dissent, and RDF Core WG welcomes feedback both positive and negative to www-rdf-comments@w3.org. We expect to review this decision in the light of the feedback.

We note that the current WD of Namespaces in XML 1.1 [5] does *not* follow the text from xlink or erratum 26, and would particularly value feedback from both the XML Names editors and I18N IG on this issue.

The test cases agreed to represent this issue are found in [6], and [7]

Detailed rationale:
- the normative dependency of M&S on XML Erratum 26, see  [8]
- the possibility of URI fraud with non NFC international URIs see [9]
- a test case that is unclear under M&S.
- an XML Namespace test case that is similar.
  See appendix below.


An alternative would be specifying %-escaping before forming the graph. This is in line with common practice in the RDF community.
The test case below indicates that clarity would require specifying the case of % escaping. So that clarification
would involve adding to the well-known % escaping algorithm of RFC 2396.
Within the RDF/XML serialization the URI fraud issue would be addressed by requiring the XML to be fully normalized. 
This would not address the security issue in non-XML uses of RDF, in which any reverse %-encoding (necessary for legibility in I18N environments) may introduce the problematic cases.
  


Jeremy Carroll
HP Rep on RDF Core
Issue owner


References

[1]
http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Apr/0474.html
[2]
http://www.w3.org/XML/xml-V10-2e-errata#E26
[3]
http://www.w3.org/TR/2001/REC-xlink-20010627/#link-locators
[4]
http://www.w3.org/TR/2002/WD-charmod-20020430 
[5]
http://www.w3.org/TR/2002/WD-xml-names11-20020403/
[6]
http://www.w3.org/2000/10/rdf-tests/rdfcore/rdf-charmod-uris
[7]
http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Apr/0116.html
[8]
http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Mar/0012.html
[9]
http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Mar/0027.html
[10]
http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002May/0008.html


More test cases

RDF/XML

<rdf:RDF
    xmlns:eg="http://example.org/#">
  <eg:André rdf:about="http://example.org/#x">
    <eg:foo rdf:resource="http://example.org/#Andr%c3%a9"/>
    <eg:bar rdf:resource="http://example.org/#Andr%C3%A9"/>
  </eg:André>
</rdf:RDF>

Is the label on the typed node the same as the label on the <eg:foo> object or the <eg:bar> object,
(or are they the same).


XML Names

<eg xmlns:a="http://example.org/#Andr%c3%a9" 
   xmlns:b="http://example.org/#Andr%C3%A9"
   xmlns:c="http://example.org/#Andr��
   a:a="a" b:a="b" c:a="a" />

If this is not well formed, why? (under the current spec and the latest WD because é is not US ASCII?) 
What about the three other cases formed by deleting one of the last three attributes.
Received on Thursday, 2 May 2002 12:08:51 UTC