XRIs

I have a few comments on Francois's XRI definition:


    [Definition: *XML resource identifiers* are XML string meant to be used 
    as IRI references or URI references].  System identifers are XML 

I think we agreed to change "XML string" to "string" yesterday.  The
definition should probably be changed to be in the singular too: "An
*XML resource identifier* is a string ...".

    resource identifiers.  An XML resource identifier may contain characters 
    that, according to [IETF RFC 3897] and [IETF RFC 3986], must be escaped 
    before the string can be used to retrieve the referenced resource. To 
    convert an XML resource identifier to an IRI reference, the following 
    characters must be escaped:
    
         * the control characters #x0 to #x1F and #x7F (most of which cannot 
    appear in XML)

Most of these *can* appear in XML 1.1 as character references.
Character references cannot be used in system identifiers, but you can
construct an internal entity containing a system identifier containing
a control character.  I suggest dropping the parenthesized comment.

         * space #x20
    
           Note:
    
           Authors are advised to avoid unescaped spaces, as XML Schema has 
    identified them as an interoperability risk.
    
         * the delimiters < #x3C, > #x3E and " #x22
         * the unwise characters { #x7B, } #x7D, | #x7C, \ #x5C, ^ #x5E and 
    ` #x60
    
    These characters are escaped by applying to them steps 2.1 to 2.3 of 
    Section 3.1 of [IETF RFC 3987].
    
    If necessary for the implementation, an IRI reference is converted to a 
    URI reference by following the prescriptions of Section 3.1 of [IETF RFC 
    3987]. This conversion MUST be performed only when absolutely necessary 
    and as late as possible in a processing chain. In particular, neither 
    the process of converting a relative IRI to an absolute one nor the 
    process of passing a IRI reference to a process or software component 
    responsible for dereferencing it SHOULD trigger escaping.
    
What about the XRI->IRI escaping?  Must it happen late?  And I can no
longer remember exactly what we're getting at here; how exactly can you
tell whether the conversion was done early or late?

Also, I would prefer it if the definition actually defined which strings
are legal XRIs.  It's implicit that they are ones that do in fact result
in IRIs after the escaping, but this should either be stated explicitly
or a production should be given.  This seems particularly important for
Namespaces, where no escaping is in fact done.

-- Richard

Received on Thursday, 16 February 2006 15:33:35 UTC