W3C home > Mailing lists > Public > www-international@w3.org > October to December 2002

FW: IRI-Everywhere (was RE: last call comments, usage of IRI rather than URI)

From: Misha Wolf <Misha.Wolf@reuters.com>
Date: Thu, 05 Dec 2002 12:49:30 +0000
Message-ID: <T5efb27557cc407b707450@reuters.com>
To: www-international@w3.org

fyi.

Misha

-----Original Message-----
From: jeremy carroll [mailto:jjc@hpl.hp.com] 
Sent: 04 December 2002 11:09
To: www-tag@w3.org
Subject: IRI-Everywhere (was RE: last call comments, usage of IRI rather than URI)



Julian Reschke wrote:
> This issue could
> *probably* solved by explicitly forbidding those ASCII characters in
> namespace names which have been forbidden in URIs as well

I am increasingly convinced by the case Julian has been making that IRIs
should treat 7-bit ascii characters according to the URI specs.

I had some test data connected with the treatment of the 'excluded'
characters.

That all of the following are relative IRIs is, at least a little,
surprising:
(I use XML attribute notation, CDATA attribute value normalization)

"&lt;b&gt;b"
"&#9;"
"&#10;"
"   "
"{"
"\"

These characters are excluded in RFC 2396 because other systems use them.
While in XML this perhaps is not an issue, with any interoperation between
XML and other systems, it becomes difficult.

An example that came up yesterday was mapping such relative IRIs out of
RDF/XML into an RDF graph (in memory) and then out into N3.
The N3 grammar expects to be able to use < and > as delimiters, as indicated
in RFC 2396.

I found myself unable to defend this treatment to a colleague whose N3
parser was having difficulty.

Is there a case as to why IRIs differ from URIs on the ascii subset?
Or is it, essentially, an historical accident?

Jeremy Carroll

Appendix - Sample normative text:
[[
The characters to be escaped are the contol characters #x0 to #x1F and #x7F
(most of which cannot appear in XML), space #x20, the delimiters '<' #x3C,
'>' #x3E and '"' #x22, the unwise characters '{' #x7B, '}' #x7D, '|' #x7C,
'\' #x5C, '^' #x5E and '`' #x60, as well as all characters above #x7F.
]]
http://www.w3.org/XML/xml-V10-2e-errata#E26

equivalently
[[
the disallowed characters include all non-ASCII characters, plus the
excluded characters listed in Section 2.4 of [IETF RFC 2396], except for the
number sign (#) and percent sign (%) and the square bracket characters
re-allowed in [IETF RFC 2732]. Disallowed characters must be escaped
]]
http://www.w3.org/TR/xlink/#link-locators

-- alternative text, URI compatible

[[
The characters to be escaped are all characters above #x80.
]]
or
[[
the disallowed characters are all non-ASCII characters. Disallowed
characters must be escaped
]]













------------------------------------------------------------- ---
        Visit our Internet site at http://www.reuters.com

Get closer to the financial markets with Reuters Messaging - for more
information and to register, visit http://www.reuters.com/messaging

Any views expressed in this message are those of  the  individual
sender,  except  where  the sender specifically states them to be
the views of Reuters Ltd.
Received on Thursday, 5 December 2002 07:57:41 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:16:59 GMT