- From: Richard Tobin <richard@inf.ed.ac.uk>
- Date: Thu, 16 Feb 2006 15:33:29 +0000 (GMT)
- To: public-xml-core-wg@w3.org
I have a few comments on Francois's XRI definition:
[Definition: *XML resource identifiers* are XML string meant to be used
as IRI references or URI references]. System identifers are XML
I think we agreed to change "XML string" to "string" yesterday. The
definition should probably be changed to be in the singular too: "An
*XML resource identifier* is a string ...".
resource identifiers. An XML resource identifier may contain characters
that, according to [IETF RFC 3897] and [IETF RFC 3986], must be escaped
before the string can be used to retrieve the referenced resource. To
convert an XML resource identifier to an IRI reference, the following
characters must be escaped:
* the control characters #x0 to #x1F and #x7F (most of which cannot
appear in XML)
Most of these *can* appear in XML 1.1 as character references.
Character references cannot be used in system identifiers, but you can
construct an internal entity containing a system identifier containing
a control character. I suggest dropping the parenthesized comment.
* space #x20
Note:
Authors are advised to avoid unescaped spaces, as XML Schema has
identified them as an interoperability risk.
* the delimiters < #x3C, > #x3E and " #x22
* the unwise characters { #x7B, } #x7D, | #x7C, \ #x5C, ^ #x5E and
` #x60
These characters are escaped by applying to them steps 2.1 to 2.3 of
Section 3.1 of [IETF RFC 3987].
If necessary for the implementation, an IRI reference is converted to a
URI reference by following the prescriptions of Section 3.1 of [IETF RFC
3987]. This conversion MUST be performed only when absolutely necessary
and as late as possible in a processing chain. In particular, neither
the process of converting a relative IRI to an absolute one nor the
process of passing a IRI reference to a process or software component
responsible for dereferencing it SHOULD trigger escaping.
What about the XRI->IRI escaping? Must it happen late? And I can no
longer remember exactly what we're getting at here; how exactly can you
tell whether the conversion was done early or late?
Also, I would prefer it if the definition actually defined which strings
are legal XRIs. It's implicit that they are ones that do in fact result
in IRIs after the escaping, but this should either be stated explicitly
or a production should be given. This seems particularly important for
Namespaces, where no escaping is in fact done.
-- Richard
Received on Thursday, 16 February 2006 15:33:35 UTC