W3C home > Mailing lists > Public > www-tag@w3.org > December 2002

IRI-Everywhere (was RE: last call comments, usage of IRI rather than URI)

From: jeremy carroll <jjc@hpl.hp.com>
Date: Wed, 4 Dec 2002 12:09:07 +0100
To: <www-tag@w3.org>
Message-ID: <MABBLGKMPIJFCKFGDBEPOEGCCBAA.jjc@hpl.hp.com>

Julian Reschke wrote:
> This issue could
> *probably* solved by explicitly forbidding those ASCII characters in
> namespace names which have been forbidden in URIs as well

I am increasingly convinced by the case Julian has been making that IRIs
should treat 7-bit ascii characters according to the URI specs.

I had some test data connected with the treatment of the 'excluded'
characters.

That all of the following are relative IRIs is, at least a little,
surprising:
(I use XML attribute notation, CDATA attribute value normalization)

"&lt;b&gt;b"
"&#9;"
"&#10;"
"   "
"{"
"\"

These characters are excluded in RFC 2396 because other systems use them.
While in XML this perhaps is not an issue, with any interoperation between
XML and other systems, it becomes difficult.

An example that came up yesterday was mapping such relative IRIs out of
RDF/XML into an RDF graph (in memory) and then out into N3.
The N3 grammar expects to be able to use < and > as delimiters, as indicated
in RFC 2396.

I found myself unable to defend this treatment to a colleague whose N3
parser was having difficulty.

Is there a case as to why IRIs differ from URIs on the ascii subset?
Or is it, essentially, an historical accident?

Jeremy Carroll

Appendix - Sample normative text:
[[
The characters to be escaped are the contol characters #x0 to #x1F and #x7F
(most of which cannot appear in XML), space #x20, the delimiters '<' #x3C,
'>' #x3E and '"' #x22, the unwise characters '{' #x7B, '}' #x7D, '|' #x7C,
'\' #x5C, '^' #x5E and '`' #x60, as well as all characters above #x7F.
]]
http://www.w3.org/XML/xml-V10-2e-errata#E26

equivalently
[[
the disallowed characters include all non-ASCII characters, plus the
excluded characters listed in Section 2.4 of [IETF RFC 2396], except for the
number sign (#) and percent sign (%) and the square bracket characters
re-allowed in [IETF RFC 2732]. Disallowed characters must be escaped
]]
http://www.w3.org/TR/xlink/#link-locators

-- alternative text, URI compatible

[[
The characters to be escaped are all characters above #x80.
]]
or
[[
the disallowed characters are all non-ASCII characters. Disallowed
characters must be escaped
]]
Received on Wednesday, 4 December 2002 06:05:39 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 26 April 2012 12:47:14 GMT