W3C home > Mailing lists > Public > www-tag@w3.org > May 2002

Re: [charmodReview-17] replacing all URIs with IRIs

From: <Misha.Wolf@reuters.com>
Date: Sat, 25 May 2002 11:51:04 +0100
Message-ID: <T5b13e2caddc407b7066f4@reuters.com>
To: Aaron Swartz <me@aaronsw.com>
Cc: www-tag@w3.org

On 25/05/2002 01:18:20 Aaron Swartz wrote:
> On Friday, May 24, 2002, at 05:04 PM, Misha.Wolf@reuters.com wrote:
[...]

> >> break many utilities which have made the assumption that RDF
> >> identifiers
> > Which utilities?
>
> All the current RDF tools, I think. I don't think any of them have been
> updated to support normalization or Unicode storage. Certainly all the
> tools I've written don't support it. If you take a look at the RDF
> Validator[1] you'll find that it %-encodes characters like Ł, as most of
> the RDF tools I know do.

On the other hand, the description of N-Triples says (in section 3.3
URI References)[1]:

| Characters above the US-ASCII range are made available by the
| \u or \U escapes as described in section Strings for ranges
| [#x80-#xFFFF] and [#x10000-#x10FFFF] respectively.

> >> I can understand presenting strings this way for user-display and
> >> user-entry but storing them this way and making them the official
> >> encoding seems to be going too far. I would think that simply using
> >> UTF-8 %-encoding would be fine for these purposes.
> >
> > How do you propose to display these strings in a meaningful manner?
> > %HH encoding is not invertible, except in the case of ASCII characters.
> > This is because the character encoding is not, in general, known.
>
> That is why I said UTF-8. I am fine with requiring a specific character
> encoding to make the process reversible.

RFC 2396, in specifying the use of %HH escaping, does not confine its
use to UTF-8.  There are plenty of URIs out there which use %HH to
escape other character encodings.  Once you have a %HH-escaped URI,
there is no way back, unless you know how it was created.  If an RDF
database contains some %HH-escaped URIs, how can anyone know whether
they arrived %HH-escaped, or whether the %HH-escaping was applied just
before their insertion in the database?

[1] http://www.w3.org/TR/rdf-testcases/#sec-uri-encoding

Misha Wolf
I18N WG Chair

> --
> Aaron Swartz∑ [http://www.aaronsw.com/]
>




-------------------------------------------------------------- --
        Visit our Internet site at http://www.reuters.com

Any views expressed in this message are those of  the  individual
sender,  except  where  the sender specifically states them to be
the views of Reuters Ltd.
Received on Saturday, 25 May 2002 08:20:12 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:55:51 UTC