XML Resource Identifiers

Here's my first attempt at a crisper description of XRIs.

<div3 id="xml-resource-identifier">
<head>XML Resource Identifiers</head>

<p>The syntactic constraints of IRIs (<bibref ref="rfc3987"/>) and
URIs (<bibref ref="rfc3986"/>) mandate that certain common punctuation
characters (such as spaces, quotation marks, and various sorts of
delimiters) must be percent encoded. However, it is often inconvenient
for authors to encode these characters.</p>

<p>Historically, XML system identifiers and, more generally, the value
of XML attributes that are intended to contain IRIs or URIs have
allowed authors to provide values that use these characters literally.

<p>An <termdef id="dt-xml-resource-identifier" term="XML resource identifier"><term>XML
resource identifier</term> is an IRI or URI in which certain common punctuation
characters may appear literally. It can be converted into an IRI or URI by
the application of a few simple encoding rules.</termdef>
To convert an <termref def="dt-xml-resource-identifier">XML
resource identifier</termref> to an IRI reference, the following
characters must be percent encoded:</p>

<ulist>
<item><p>the control characters #x1 to #x1F and #x7F (the control character #x0
can never appear)</p></item>
<item><p>space #x20</p>
<note>
<p>Authors are advised to avoid literal space characters, as XML Schema
has identified them as an interoperability risk.</p>
</note>
</item>
<item>
<p>the delimiters -Y´&lt;¡ #x3C, ´&gt;¡ #x3E, and ´&quot;¡ #x22</p></item>
<item>
<p>the unwise characters -Y´{´ #x7B, ´}¡ #x7D, ´|¡ #x7C, ´\¡ #x5C, ´^¡ #x5E,
and -Y´`¡ #x60</p>
</item>
</ulist>

<p>These characters are percent encoded by applying steps 2.1 to 2.3 of
Section 3.1 of <bibref ref="rfc3987"/> to them.</p>

<p>Though many applications do not check if the value of an XML
Resource Identifier is legal, it can be done by applying the encoding rules
above. If the resulting string is a legal IRI or URI, then the XML Resource
Identifier is legal.</p>

<p>Processing a relative identifier against a base is handled
straightforwardly; the algorithms of <bibref ref="rfc3986"/> can be
applied directly, treating the characters additionally allowed in XML
resource identifiers in the same way that unreserved characters are in
URI references.</p>

<p>If required, the IRI reference resulting from percent encoding an XML
Resource Identifier can be converted to a URI reference by
following the prescriptions of Section 3.1 of <bibref ref="rfc3987"/>.
</p>

<p>Conversion from an XML Resource Identifier to an IRI or a URI
<termref def="dt-must">must</termref>
be performed only when absolutely necessary and as late as
possible in a processing chain. In particular, neither the
process of converting a relative XML Resource identifier to an
absolute one nor the process of passing a XML Resource Identifier
to a process or software component responsible for dereferencing
it <termref def="dt-must">should</termref> trigger percent encoding.</p>

</div3>


                                        Be seeing you,
                                          norm

-- 
Norman Walsh
XML Standards Architect
Sun Microsystems, Inc.

Received on Wednesday, 14 March 2007 15:02:22 UTC