RE: charmod-uri draft message from Jeremy Carroll on 2002-05-02 (w3c-rdfcore-wg@w3.org from May 2002)

From: Jeremy Carroll <jjc@hplb.hpl.hp.com>
Date: Thu, 2 May 2002 20:33:16 +0100
To: "Jeremy Carroll" <jjc@hplb.hpl.hp.com>, <w3c-rdfcore-wg@w3.org>
Message-ID: <JAEBJCLMIFLKLOJGMELDEEOPCDAA.jjc@hplb.hpl.hp.com>
I should have linked to the dissent messages as well, how about adding the following as the last paragraph (before my name)

[[
The dissenters primarily considered this as a significant change to both RDF and the understanding of RFC2396 and hence an inappropriate step for rdfcore WG, independent of the technical merits or otherwise of the decision. [11] [12]

[11]
http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Apr/0537.html
[12]
http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Apr/0534.html
]]

(I note that the message I pulled out for Aaron (0537) is mid-thread, but I judge it as powerfully expressing his position).

Jeremy
  -----Original Message-----
  From: w3c-rdfcore-wg-request@w3.org [mailto:w3c-rdfcore-wg-request@w3.org]On Behalf Of Jeremy Carroll
  Sent: 02 May 2002 17:08
  To: w3c-rdfcore-wg@w3.org
  Subject: charmod-uri draft message



  Draft:
   - for www-rdf-interest
         www-webont-wg
         uri
         w3c-i18n-ig
         xml-names-editor@w3.org

  The RDF Core WG has decided [1] that the URI ref labels used in an RDF graph are in Unicode rather than US-ASCII.
  These are seen as "original character sequences" in the terminology of RFC 2396 (see [10]). 

  This is motivated by the clarity in XML Erratum 26 [2] (and equally in XLink section 5.4) that the %-escaping algorithm that is used for mapping international characters into US-ASCII is "performed only when absolutely necessary and as late as possible in a processing chain". 

  In the terms of the text from xlink [3] the labels on the RDF graph  "must be a URI reference as defined in [IETF RFC 2396], or must result in a URI reference after the escaping procedure [...] is applied."

  Moreover, the WG decided, guided by charmod [4] that such labels should conform with Unicode Normal Form C.

  For those unfamiliar with the RDF graph but familiar with XML Namespaces, the analogous decision would be that the identifier of an XML Namespaces "must be a URI reference as defined in [IETF RFC 2396], or must result in a URI reference after the escaping procedure [...] is applied." and that the escaping procedure is *not* applied before considering whether two such identiers are identical.

  The decision was with dissent, and RDF Core WG welcomes feedback both positive and negative to www-rdf-comments@w3.org. We expect to review this decision in the light of the feedback.

  We note that the current WD of Namespaces in XML 1.1 [5] does *not* follow the text from xlink or erratum 26, and would particularly value feedback from both the XML Names editors and I18N IG on this issue.

  The test cases agreed to represent this issue are found in [6], and [7]

  Detailed rationale:
  - the normative dependency of M&S on XML Erratum 26, see  [8]
  - the possibility of URI fraud with non NFC international URIs see [9]
  - a test case that is unclear under M&S.
  - an XML Namespace test case that is similar.
    See appendix below.


  An alternative would be specifying %-escaping before forming the graph. This is in line with common practice in the RDF community.
  The test case below indicates that clarity would require specifying the case of % escaping. So that clarification
  would involve adding to the well-known % escaping algorithm of RFC 2396.
  Within the RDF/XML serialization the URI fraud issue would be addressed by requiring the XML to be fully normalized. 
  This would not address the security issue in non-XML uses of RDF, in which any reverse %-encoding (necessary for legibility in I18N environments) may introduce the problematic cases.
    


  Jeremy Carroll
  HP Rep on RDF Core
  Issue owner


  References

  [1]
  http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Apr/0474.html
  [2]
  http://www.w3.org/XML/xml-V10-2e-errata#E26
  [3]
  http://www.w3.org/TR/2001/REC-xlink-20010627/#link-locators
  [4]
  http://www.w3.org/TR/2002/WD-charmod-20020430 
  [5]
  http://www.w3.org/TR/2002/WD-xml-names11-20020403/
  [6]
  http://www.w3.org/2000/10/rdf-tests/rdfcore/rdf-charmod-uris
  [7]
  http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Apr/0116.html
  [8]
  http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Mar/0012.html
  [9]
  http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Mar/0027.html
  [10]
  http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002May/0008.html


  More test cases

  RDF/XML

  <rdf:RDF
      xmlns:eg="http://example.org/#">
    <eg:André rdf:about="http://example.org/#x">
      <eg:foo rdf:resource="http://example.org/#Andr%c3%a9"/>
      <eg:bar rdf:resource="http://example.org/#Andr%C3%A9"/>
    </eg:André>
  </rdf:RDF>

  Is the label on the typed node the same as the label on the <eg:foo> object or the <eg:bar> object,
  (or are they the same).


  XML Names

  <eg xmlns:a="http://example.org/#Andr%c3%a9" 
     xmlns:b="http://example.org/#Andr%C3%A9"
     xmlns:c="http://example.org/#Andr��
     a:a="a" b:a="b" c:a="a" />

  If this is not well formed, why? (under the current spec and the latest WD because é is not US ASCII?) 
  What about the three other cases formed by deleting one of the last three attributes.
Received on Thursday, 2 May 2002 15:33:52 UTC