- From: François Yergeau <francois@yergeau.com>
- Date: Wed, 11 Jan 2006 07:49:20 -0800
- To: public-xml-core-wg@w3.org
I have an action to craft uniform language for IRIs in XLink 1.1, XLink
1.0, xml:base, xinclude, XML 1.0, and XML 1.1 (as errata for all but
XLink 1.1). This is my attempt (still missing XLink 1.0, and not
perfectly uniform):
XLink 1.1
===========================================================
section 5.4 [http://www.w3.org/TR/2005/WD-xlink11-20050707/#link-locators]
Change from:
-------------------------
The value of the href attribute must be an IRI reference as defined in
[IETF RFC 3987] or must result in an IRI reference after the escaping
procedure described below is applied. (By design, all URIs (Uniform
Resource Identifiers) as defined in [IETF RFC 3986] are also IRIs.)
XLink 1.0 described a procedure for escaping characters found in the
href attribute value that were not allowed in URIs. For XLink 1.1, those
details are normatively described in Section 3.1 of [IETF RFC 3987].
However, for backwards compatibility, XLink 1.1 processors must escape
one additional character, the space. All occurrences of a space in the
value of an href attribute must be replaced by %20.
-------------------------
to:
-------------------------
The value of the href attribute must be an IRI reference as defined in
[IETF RFC 3987] or must result in an IRI reference after the escaping
procedure described below is applied. (By design, all URIs (Uniform
Resource Identifiers) as defined in [IETF RFC 3986] are also IRIs.)
To convert the value of the href attribute to an IRI reference, the
following characters must be escaped:
* the control characters #x0 to #x1F and #x7F (most of which cannot
appear in XML)
* space #x20
Note: Authors are advised to avoid unescaped spaces, as XML
Schema has identified them as an interoperability risk.
* the delimiters < #x3C, > #x3E and " #x22
* the unwise characters { #x7B, } #x7D, | #x7C, \ #x5C, ^ #x5E and
` #x60
These characters are escaped by applying to them steps 2.1 to 2.3 of
Section 3.1 of [IETF RFC 3987].
If necessary for the implementation, an IRI reference is converted to a
URI reference according to the prescriptions of Section 3.1 of [IETF RFC
3987]. The two conversions (href value to IRI reference, IRI reference
to URI reference) may be merged.
-------------------------
[first para unchanged, rest adapted from XInclude)
XML Base 1.0
===========================================================
section 3.0 [http://www.w3.org/TR/2001/REC-xmlbase-20010627/#syntax]
Change from:
-------------------------
The value of this attribute is interpreted as a URI Reference as defined
in RFC 2396 [IETF RFC 2396], after processing according to Section 3.1.
-------------------------
to:
-------------------------
The value of this attribute is interpreted as an IRI Reference as
defined in RFC 3987 [IETF RFC 3987], after processing according to
Section 3.1.
-------------------------
section 3.1 [http://www.w3.org/TR/2001/REC-xmlbase-20010627/#escaping]
Change from:
-------------------------
The set of characters allowed in xml:base attributes is the same as for
XML, namely [Unicode]. However, some Unicode characters are disallowed
from URI references, and thus processors must encode and escape these
characters to obtain a valid URI reference from the attribute value.
The disallowed characters include all non-ASCII characters, plus the
excluded characters listed in Section 2.4 of [IETF RFC 2396], except for
the number sign (#) and percent sign (%) characters and the square
bracket characters re-allowed in [IETF RFC 2732]. Disallowed characters
must be escaped as follows:
1. Each disallowed character is converted to UTF-8 [IETF RFC 2279]
as one or more bytes.
2. Any bytes corresponding to a disallowed character are escaped
with the URI escaping mechanism (that is, converted to %HH, where HH is
the hexadecimal notation of the byte value).
3. The original character is replaced by the resulting character
sequence.
-------------------------
to:
-------------------------
To convert the value of the xml:base attribute to an IRI reference, the
following characters must be escaped:
* the control characters #x0 to #x1F and #x7F (most of which cannot
appear in XML)
* space #x20
Note:
Authors are advised to avoid unescaped spaces, as XML Schema has
identified them as an interoperability risk.
* the delimiters < #x3C, > #x3E and " #x22
* the unwise characters { #x7B, } #x7D, | #x7C, \ #x5C, ^ #x5E and
` #x60
These characters are escaped by applying to them steps 2.1 to 2.3 of
Section 3.1 of [IETF RFC 3987].
If necessary for the implementation, an IRI reference is converted to a
URI reference according to the prescriptions of Section 3.1 of [IETF RFC
3987]. The two conversions (xml:base value to IRI reference, IRI
reference to URI reference) may be merged.
-------------------------
XInclude 1.0
section 4.1.1 [http://www.w3.org/TR/2004/REC-xinclude-20041220/#IRIs]
===========================================================
Change from:
-------------------------
The href attribute value is converted to either a URI reference or an
IRI reference, as appropriate to the implementation.
Work is currently in progress to produce an RFC defining
Internationalized Resource Identifiers (IRIs). Since this work is not
yet complete, in this section we define IRI references syntactically. We
expect to issue an erratum replacing portions of this section with a
reference to the RFC when it is published. For a more general definition
and discussion of IRIs see [IRI draft] (work in progress).
[Definition: An IRI reference is a string that can be converted to a URI
reference by escaping the following additional characters:]
* the Unicode plane 0 characters #xA0 - #xD7FF, #xF900-#xFDCF,
#xFDF0-#xFFEF
* the Unicode plane 1-14 characters #x10000-#x1FFFD ... #xE0000-#xEFFFD
To convert the value of the href attribute to an IRI reference, the
following characters must be escaped:
* space #x20
Note:
Authors are advised to avoid unescaped spaces, as XML Schema has
identified them as an interoperability risk.
* the delimiters < #x3C, > #x3E and " #x22
* the unwise characters { #x7B, } #x7D, | #x7C, \ #x5C, ^ #x5E and
` #x60
These characters are escaped as follows:
1. Each additional character is converted to UTF-8 [Unicode] as one
or more bytes.
2. The resulting bytes are escaped with the URI escaping mechanism
(that is, converted to %HH, where HH is the hexadecimal notation of the
byte value).
3. The original character is replaced by the resulting character
sequence.
To convert an IRI reference to a URI reference, the additional
characters allowed in IRIs must be escaped using the same method.
-------------------------
to:
-------------------------
The value of the href attribute must be an IRI reference as defined in
[IETF RFC 3987] or must result in an IRI reference after the escaping
procedure described below is applied. (By design, all URIs (Uniform
Resource Identifiers) as defined in [IETF RFC 3986] are also IRIs.)
To convert the value of the href attribute to an IRI reference, the
following characters must be escaped:
* the control characters #x0 to #x1F and #x7F (most of which cannot
appear in XML)
* space #x20
Note:
Authors are advised to avoid unescaped spaces, as XML Schema has
identified them as an interoperability risk.
* the delimiters < #x3C, > #x3E and " #x22
* the unwise characters { #x7B, } #x7D, | #x7C, \ #x5C, ^ #x5E and
` #x60
These characters are escaped by applying to them steps 2.1 to 2.3 of
Section 3.1 of [IETF RFC 3987].
If necessary for the implementation, an IRI reference is converted to a
URI reference according to the prescriptions of Section 3.1 of [IETF RFC
3987]. The two conversions (href value to IRI reference, IRI reference
to URI reference) may be merged.
-------------------------
XML 1.0
section 4.2.2 [http://www.w3.org/TR/2004/REC-xml-20040204/#sec-external-ent]
XML 1.1
section 4.2.2
[http://www.w3.org/TR/2004/REC-xml11-20040204/#sec-external-ent]
===========================================================
Change from:
-------------------------
System identifiers (and other XML strings meant to be used as URI
references) MAY contain characters that, according to [IETF RFC 2396]
and [IETF RFC 2732], must be escaped before a URI can be used to
retrieve the referenced resource. The characters to be escaped are the
control characters #x0 to #x1F and #x7F (most of which cannot appear in
XML), space #x20, the delimiters '<' #x3C, '>' #x3E and '"' #x22, the
unwise characters '{' #x7B, '}' #x7D, '|' #x7C, '\' #x5C, '^' #x5E and
'`' #x60, as well as all characters above #x7F. Since escaping is not
always a fully reversible process, it MUST be performed only when
absolutely necessary and as late as possible in a processing chain. In
particular, neither the process of converting a relative URI to an
absolute one nor the process of passing a URI reference to a process or
software component responsible for dereferencing it SHOULD trigger
escaping. When escaping does occur, it MUST be performed as follows:
1. Each character to be escaped is represented in UTF-8 [Unicode3]
as one or more bytes.
2. The resulting bytes are escaped with the URI escaping mechanism
(that is, converted to %HH, where HH is the hexadecimal notation of the
byte value).
3. The original character is replaced by the resulting character
sequence.
-------------------------
to:
-------------------------
System identifiers (and other XML strings meant to be used as URI
references) MAY contain characters that, according to [IETF RFC 3986],
must be escaped before a URI can be used to retrieve the referenced
resource. This escaping MUST be performed following the prescriptions of
Section 3.1 of [IETF RFC 3987], including the escaping (optional in RFC
3987) of the follwoing characters:
* the control characters #x0 to #x1F and #x7F (most of which cannot
appear in XML)
* space #x20
Note:
Authors are advised to avoid unescaped spaces, as XML Schema has
identified them as an interoperability risk.
* the delimiters < #x3C, > #x3E and " #x22
* the unwise characters { #x7B, } #x7D, | #x7C, \ #x5C, ^ #x5E and
` #x60
-------------------------
Received on Wednesday, 11 January 2006 15:49:14 UTC