- From: Graham Klyne <Graham.Klyne@MIMEsweeper.com>
- Date: Wed, 17 Oct 2001 10:21:12 +0100
- To: Jeremy Carroll <jjc@hplb.hpl.hp.com>
- Cc: w3c-rdfcore-wg@w3.org
At 09:35 AM 10/17/01 +0100, Jeremy Carroll wrote: [...] >Dave's text is currently neutral vis-a-vis internationalized URI's. >Aaron's text takes a URI-ref to be precisely a US-ASCII URI ala RFC >2396. > >I owe the group some work on internationalization, but currently am of >the opinion that we should allow internationalized URIs wherever we >allow uri-references; these being resolved into US-ASCII URIs ala RFC >2396 (as modified by RFC 2732) at the same time as relative URI's are >resolved, using the standard algorithm. I've been in discussions with I18N group about URIs in CC/PP (which is an application of RDF). Basically, their position (as I understand it) is that URIs in an XML document should be regarded as an "original character sequence" rather than a "URI character sequence" (see RFC 2396, section 2.1). Then, when a URI is dereferenced, or otherwise required in "URI character sequence" form, the appropriate transformation to an octet sequence is performed (dependent on the code point set used for the XML document), and then URI escaping (%hh) is applied to yield a "URI character sequence". If the XML document uses Unicode characters, then the required octet encoding would be UTF-8, which provides an unambiguous interpretation for URIs in XML. If other character sets are being used, then the interpretation is subject to application interpretation, but I presume that use of non-Unicode codepoint sets is generally discouraged for new data. There is some language in the XML linking spec (http://www.w3.org/TR/xlink/#link-locators) that I am planning to adapt for the CC/PP spec: [[[ The value of the href attribute must be a URI reference as defined in [IETF RFC 2396], or must result in a URI reference after the escaping procedure described below is applied. The procedure is applied when passing the URI reference to a URI resolver. Some characters are disallowed in URI references, even if they are allowed in XML; the disallowed characters include all non-ASCII characters, plus the excluded characters listed in Section 2.4 of [IETF RFC 2396], except for the number sign (#) and percent sign (%) and the square bracket characters re-allowed in [IETF RFC 2732]. Disallowed characters must be escaped as follows: - Each disallowed character is converted to UTF-8 [IETF RFC 2279] as one or more bytes. - Any bytes corresponding to a disallowed character are escaped with the URI escaping mechanism (that is, converted to %HH, where HH is the hexadecimal notation of the byte value). - The original character is replaced by the resulting character sequence. ]]] I contend that this approach is reasonable, but not currently documented in any W3C Recommendation in such a way that suggests that it applies to any URI in an XML document. Notwithstanding, I predict that this is how I18N will strongly request we adopt this approach (at least for rdf:about and rdf:resource). [Later: I note that the XML schema anyURI datatype (http://www.w3.org/TR/xmlschema-2/#anyURI) refers to the XML Linking language quoted above. If we said that the attribute values of rdf:about, rdf:resource were 'anyURI' per XML schema datatypes, I think the rest would follow.] #g ------------------------------------------------------------ Graham Klyne MIMEsweeper Group Strategic Research <http://www.mimesweeper.com> <Graham.Klyne@MIMEsweeper.com> ------------------------------------------------------------
Received on Wednesday, 17 October 2001 10:41:27 UTC