- From: Norman Walsh <Norman.Walsh@Sun.COM>
- Date: Wed, 14 Mar 2007 11:02:09 -0400
- To: public-xml-core-wg@w3.org
- Message-ID: <87fy87vpfy.fsf@nwalsh.com>
Here's my first attempt at a crisper description of XRIs. <div3 id="xml-resource-identifier"> <head>XML Resource Identifiers</head> <p>The syntactic constraints of IRIs (<bibref ref="rfc3987"/>) and URIs (<bibref ref="rfc3986"/>) mandate that certain common punctuation characters (such as spaces, quotation marks, and various sorts of delimiters) must be percent encoded. However, it is often inconvenient for authors to encode these characters.</p> <p>Historically, XML system identifiers and, more generally, the value of XML attributes that are intended to contain IRIs or URIs have allowed authors to provide values that use these characters literally. <p>An <termdef id="dt-xml-resource-identifier" term="XML resource identifier"><term>XML resource identifier</term> is an IRI or URI in which certain common punctuation characters may appear literally. It can be converted into an IRI or URI by the application of a few simple encoding rules.</termdef> To convert an <termref def="dt-xml-resource-identifier">XML resource identifier</termref> to an IRI reference, the following characters must be percent encoded:</p> <ulist> <item><p>the control characters #x1 to #x1F and #x7F (the control character #x0 can never appear)</p></item> <item><p>space #x20</p> <note> <p>Authors are advised to avoid literal space characters, as XML Schema has identified them as an interoperability risk.</p> </note> </item> <item> <p>the delimiters -Y´<¡ #x3C, ´>¡ #x3E, and ´"¡ #x22</p></item> <item> <p>the unwise characters -Y´{´ #x7B, ´}¡ #x7D, ´|¡ #x7C, ´\¡ #x5C, ´^¡ #x5E, and -Y´`¡ #x60</p> </item> </ulist> <p>These characters are percent encoded by applying steps 2.1 to 2.3 of Section 3.1 of <bibref ref="rfc3987"/> to them.</p> <p>Though many applications do not check if the value of an XML Resource Identifier is legal, it can be done by applying the encoding rules above. If the resulting string is a legal IRI or URI, then the XML Resource Identifier is legal.</p> <p>Processing a relative identifier against a base is handled straightforwardly; the algorithms of <bibref ref="rfc3986"/> can be applied directly, treating the characters additionally allowed in XML resource identifiers in the same way that unreserved characters are in URI references.</p> <p>If required, the IRI reference resulting from percent encoding an XML Resource Identifier can be converted to a URI reference by following the prescriptions of Section 3.1 of <bibref ref="rfc3987"/>. </p> <p>Conversion from an XML Resource Identifier to an IRI or a URI <termref def="dt-must">must</termref> be performed only when absolutely necessary and as late as possible in a processing chain. In particular, neither the process of converting a relative XML Resource identifier to an absolute one nor the process of passing a XML Resource Identifier to a process or software component responsible for dereferencing it <termref def="dt-must">should</termref> trigger percent encoding.</p> </div3> Be seeing you, norm -- Norman Walsh XML Standards Architect Sun Microsystems, Inc.
Received on Wednesday, 14 March 2007 15:02:22 UTC