- From: François Yergeau <francois@yergeau.com>
- Date: Wed, 11 Jan 2006 07:49:20 -0800
- To: public-xml-core-wg@w3.org
I have an action to craft uniform language for IRIs in XLink 1.1, XLink 1.0, xml:base, xinclude, XML 1.0, and XML 1.1 (as errata for all but XLink 1.1). This is my attempt (still missing XLink 1.0, and not perfectly uniform): XLink 1.1 =========================================================== section 5.4 [http://www.w3.org/TR/2005/WD-xlink11-20050707/#link-locators] Change from: ------------------------- The value of the href attribute must be an IRI reference as defined in [IETF RFC 3987] or must result in an IRI reference after the escaping procedure described below is applied. (By design, all URIs (Uniform Resource Identifiers) as defined in [IETF RFC 3986] are also IRIs.) XLink 1.0 described a procedure for escaping characters found in the href attribute value that were not allowed in URIs. For XLink 1.1, those details are normatively described in Section 3.1 of [IETF RFC 3987]. However, for backwards compatibility, XLink 1.1 processors must escape one additional character, the space. All occurrences of a space in the value of an href attribute must be replaced by %20. ------------------------- to: ------------------------- The value of the href attribute must be an IRI reference as defined in [IETF RFC 3987] or must result in an IRI reference after the escaping procedure described below is applied. (By design, all URIs (Uniform Resource Identifiers) as defined in [IETF RFC 3986] are also IRIs.) To convert the value of the href attribute to an IRI reference, the following characters must be escaped: * the control characters #x0 to #x1F and #x7F (most of which cannot appear in XML) * space #x20 Note: Authors are advised to avoid unescaped spaces, as XML Schema has identified them as an interoperability risk. * the delimiters < #x3C, > #x3E and " #x22 * the unwise characters { #x7B, } #x7D, | #x7C, \ #x5C, ^ #x5E and ` #x60 These characters are escaped by applying to them steps 2.1 to 2.3 of Section 3.1 of [IETF RFC 3987]. If necessary for the implementation, an IRI reference is converted to a URI reference according to the prescriptions of Section 3.1 of [IETF RFC 3987]. The two conversions (href value to IRI reference, IRI reference to URI reference) may be merged. ------------------------- [first para unchanged, rest adapted from XInclude) XML Base 1.0 =========================================================== section 3.0 [http://www.w3.org/TR/2001/REC-xmlbase-20010627/#syntax] Change from: ------------------------- The value of this attribute is interpreted as a URI Reference as defined in RFC 2396 [IETF RFC 2396], after processing according to Section 3.1. ------------------------- to: ------------------------- The value of this attribute is interpreted as an IRI Reference as defined in RFC 3987 [IETF RFC 3987], after processing according to Section 3.1. ------------------------- section 3.1 [http://www.w3.org/TR/2001/REC-xmlbase-20010627/#escaping] Change from: ------------------------- The set of characters allowed in xml:base attributes is the same as for XML, namely [Unicode]. However, some Unicode characters are disallowed from URI references, and thus processors must encode and escape these characters to obtain a valid URI reference from the attribute value. The disallowed characters include all non-ASCII characters, plus the excluded characters listed in Section 2.4 of [IETF RFC 2396], except for the number sign (#) and percent sign (%) characters and the square bracket characters re-allowed in [IETF RFC 2732]. Disallowed characters must be escaped as follows: 1. Each disallowed character is converted to UTF-8 [IETF RFC 2279] as one or more bytes. 2. Any bytes corresponding to a disallowed character are escaped with the URI escaping mechanism (that is, converted to %HH, where HH is the hexadecimal notation of the byte value). 3. The original character is replaced by the resulting character sequence. ------------------------- to: ------------------------- To convert the value of the xml:base attribute to an IRI reference, the following characters must be escaped: * the control characters #x0 to #x1F and #x7F (most of which cannot appear in XML) * space #x20 Note: Authors are advised to avoid unescaped spaces, as XML Schema has identified them as an interoperability risk. * the delimiters < #x3C, > #x3E and " #x22 * the unwise characters { #x7B, } #x7D, | #x7C, \ #x5C, ^ #x5E and ` #x60 These characters are escaped by applying to them steps 2.1 to 2.3 of Section 3.1 of [IETF RFC 3987]. If necessary for the implementation, an IRI reference is converted to a URI reference according to the prescriptions of Section 3.1 of [IETF RFC 3987]. The two conversions (xml:base value to IRI reference, IRI reference to URI reference) may be merged. ------------------------- XInclude 1.0 section 4.1.1 [http://www.w3.org/TR/2004/REC-xinclude-20041220/#IRIs] =========================================================== Change from: ------------------------- The href attribute value is converted to either a URI reference or an IRI reference, as appropriate to the implementation. Work is currently in progress to produce an RFC defining Internationalized Resource Identifiers (IRIs). Since this work is not yet complete, in this section we define IRI references syntactically. We expect to issue an erratum replacing portions of this section with a reference to the RFC when it is published. For a more general definition and discussion of IRIs see [IRI draft] (work in progress). [Definition: An IRI reference is a string that can be converted to a URI reference by escaping the following additional characters:] * the Unicode plane 0 characters #xA0 - #xD7FF, #xF900-#xFDCF, #xFDF0-#xFFEF * the Unicode plane 1-14 characters #x10000-#x1FFFD ... #xE0000-#xEFFFD To convert the value of the href attribute to an IRI reference, the following characters must be escaped: * space #x20 Note: Authors are advised to avoid unescaped spaces, as XML Schema has identified them as an interoperability risk. * the delimiters < #x3C, > #x3E and " #x22 * the unwise characters { #x7B, } #x7D, | #x7C, \ #x5C, ^ #x5E and ` #x60 These characters are escaped as follows: 1. Each additional character is converted to UTF-8 [Unicode] as one or more bytes. 2. The resulting bytes are escaped with the URI escaping mechanism (that is, converted to %HH, where HH is the hexadecimal notation of the byte value). 3. The original character is replaced by the resulting character sequence. To convert an IRI reference to a URI reference, the additional characters allowed in IRIs must be escaped using the same method. ------------------------- to: ------------------------- The value of the href attribute must be an IRI reference as defined in [IETF RFC 3987] or must result in an IRI reference after the escaping procedure described below is applied. (By design, all URIs (Uniform Resource Identifiers) as defined in [IETF RFC 3986] are also IRIs.) To convert the value of the href attribute to an IRI reference, the following characters must be escaped: * the control characters #x0 to #x1F and #x7F (most of which cannot appear in XML) * space #x20 Note: Authors are advised to avoid unescaped spaces, as XML Schema has identified them as an interoperability risk. * the delimiters < #x3C, > #x3E and " #x22 * the unwise characters { #x7B, } #x7D, | #x7C, \ #x5C, ^ #x5E and ` #x60 These characters are escaped by applying to them steps 2.1 to 2.3 of Section 3.1 of [IETF RFC 3987]. If necessary for the implementation, an IRI reference is converted to a URI reference according to the prescriptions of Section 3.1 of [IETF RFC 3987]. The two conversions (href value to IRI reference, IRI reference to URI reference) may be merged. ------------------------- XML 1.0 section 4.2.2 [http://www.w3.org/TR/2004/REC-xml-20040204/#sec-external-ent] XML 1.1 section 4.2.2 [http://www.w3.org/TR/2004/REC-xml11-20040204/#sec-external-ent] =========================================================== Change from: ------------------------- System identifiers (and other XML strings meant to be used as URI references) MAY contain characters that, according to [IETF RFC 2396] and [IETF RFC 2732], must be escaped before a URI can be used to retrieve the referenced resource. The characters to be escaped are the control characters #x0 to #x1F and #x7F (most of which cannot appear in XML), space #x20, the delimiters '<' #x3C, '>' #x3E and '"' #x22, the unwise characters '{' #x7B, '}' #x7D, '|' #x7C, '\' #x5C, '^' #x5E and '`' #x60, as well as all characters above #x7F. Since escaping is not always a fully reversible process, it MUST be performed only when absolutely necessary and as late as possible in a processing chain. In particular, neither the process of converting a relative URI to an absolute one nor the process of passing a URI reference to a process or software component responsible for dereferencing it SHOULD trigger escaping. When escaping does occur, it MUST be performed as follows: 1. Each character to be escaped is represented in UTF-8 [Unicode3] as one or more bytes. 2. The resulting bytes are escaped with the URI escaping mechanism (that is, converted to %HH, where HH is the hexadecimal notation of the byte value). 3. The original character is replaced by the resulting character sequence. ------------------------- to: ------------------------- System identifiers (and other XML strings meant to be used as URI references) MAY contain characters that, according to [IETF RFC 3986], must be escaped before a URI can be used to retrieve the referenced resource. This escaping MUST be performed following the prescriptions of Section 3.1 of [IETF RFC 3987], including the escaping (optional in RFC 3987) of the follwoing characters: * the control characters #x0 to #x1F and #x7F (most of which cannot appear in XML) * space #x20 Note: Authors are advised to avoid unescaped spaces, as XML Schema has identified them as an interoperability risk. * the delimiters < #x3C, > #x3E and " #x22 * the unwise characters { #x7B, } #x7D, | #x7C, \ #x5C, ^ #x5E and ` #x60 -------------------------
Received on Wednesday, 11 January 2006 15:49:14 UTC