W3C home > Mailing lists > Public > public-xml-core-wg@w3.org > February 2006

XML Resource Identifiers

From: Norman Walsh <Norman.Walsh@Sun.COM>
Date: Thu, 16 Feb 2006 10:40:59 -0500
To: public-xml-core-wg@w3.org
Message-ID: <87fymjux9w.fsf@nwalsh.com>
/ Richard Tobin <richard@inf.ed.ac.uk> was heard to say:
| I have a few comments on Francois's XRI definition:

*Please* don't use "XRI" as the acronym for these things. There's an
OASIS TC that's created XRIs (Extensible Resource Identifiers).
They're just reinventing http...nevermind, wrong list. :-)

|     [Definition: *XML resource identifiers* are XML string meant to be used 
|     as IRI references or URI references].  System identifers are XML 
|
| I think we agreed to change "XML string" to "string" yesterday.  The

To "character string" actually

| definition should probably be changed to be in the singular too: "An
| *XML resource identifier* is a string ...".

I might have done that as well.

|     resource identifiers.  An XML resource identifier may contain characters 
|     that, according to [IETF RFC 3897] and [IETF RFC 3986], must be escaped 
|     before the string can be used to retrieve the referenced resource. To 
|     convert an XML resource identifier to an IRI reference, the following 
|     characters must be escaped:
|
|          * the control characters #x0 to #x1F and #x7F (most of which cannot 
|     appear in XML)
|
| Most of these *can* appear in XML 1.1 as character references.
| Character references cannot be used in system identifiers, but you can
| construct an internal entity containing a system identifier containing
| a control character.  I suggest dropping the parenthesized comment.

Works for me.

|          * space #x20
|
|            Note:
|
|            Authors are advised to avoid unescaped spaces, as XML Schema has 
|     identified them as an interoperability risk.
|
|          * the delimiters < #x3C, > #x3E and " #x22
|          * the unwise characters { #x7B, } #x7D, | #x7C, \ #x5C, ^ #x5E and 
|     ` #x60
|
|     These characters are escaped by applying to them steps 2.1 to 2.3 of 
|     Section 3.1 of [IETF RFC 3987].
|
|     If necessary for the implementation, an IRI reference is converted to a 
|     URI reference by following the prescriptions of Section 3.1 of [IETF RFC 
|     3987]. This conversion MUST be performed only when absolutely necessary 
|     and as late as possible in a processing chain. In particular, neither 
|     the process of converting a relative IRI to an absolute one nor the 
|     process of passing a IRI reference to a process or software component 
|     responsible for dereferencing it SHOULD trigger escaping.
|
| What about the XRI->IRI escaping?  Must it happen late?  And I can no
| longer remember exactly what we're getting at here; how exactly can you
| tell whether the conversion was done early or late?
|
| Also, I would prefer it if the definition actually defined which strings
| are legal XRIs.  It's implicit that they are ones that do in fact result
| in IRIs after the escaping, but this should either be stated explicitly
| or a production should be given.  This seems particularly important for
| Namespaces, where no escaping is in fact done.

Uh....Francois? :-)

                                        Be seeing you,
                                          norm

-- 
Norman.Walsh@Sun.COM / XML Standards Architect / Sun Microsystems, Inc.
NOTICE: This email message is for the sole use of the intended
recipient(s) and may contain confidential and privileged information.
Any unauthorized review, use, disclosure or distribution is prohibited.
If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.

Received on Thursday, 16 February 2006 15:41:09 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:16:35 UTC