Re: replacing all URIs with IRIs [charmodReview-17] from Martin Duerst on 2002-05-27 (www-tag@w3.org from May 2002)

From: Martin Duerst <duerst@w3.org>
Date: Mon, 27 May 2002 14:41:59 +0900
To: Aaron Swartz <me@aaronsw.com>
Cc: www-tag@w3.org
Message-Id: <4.2.0.58.J.20020527141112.00a85848@localhost>
At 18:40 02/05/24 -0500, Aaron Swartz wrote:
>On Friday, May 24, 2002, at 06:11 PM, Martin Duerst wrote:

>>First some procedural points, starting with the end
>>of your mail:
>>
>>>I'm considering appealing this decision,
>>
>>The Character Model is in last call, so you can raise a comment.
>
>Oops, I should have been more clear. It was the RDF decision I was 
>thinking of appealing.

I see. But I guess they are not even in last call yet.

>I assume that charmod will be decided in its own way.

Yes, but obviously things should work together, and the RDF
spec should conform to the character model.


>>>I can understand presenting strings this way for user-display and 
>>>user-entry but storing them this way and making them the official 
>>>encoding seems to be going too far.
>>XML can 'store' them without problems. N3 also should be able to do it.
>
>XML and N3 are interchange formats, I meant storage in the sense of 
>databases and APIs.

The RDF spec defines the XML representation. I don't think
there is any W3C spec for RDF databases or RDF APIs.

I also don't think there are any serious databases that would
have problems with 8-bit data. Same for APIs. The easiest way
to define an API is to say that the parameters are encoded in
UTF-8 (or maybe UTF-16). But of course you are always free to
define some other conventions for your own API.


>>>I would think that simply using UTF-8 %-encoding would be fine for these 
>>>purposes.
>>
>>Why do you think so? Would you think it would make sense to replace
>>     mailto:me@aaronsw.com
>>with something like
>>     mailto:%6d%65@%a1%a1%72%6f%6e%73%77.%63%6f%6d
>>or maybe even more appropriately, with something like the above
>>but using Greek letters instead of Latin ones? This is just about
>>how people using another script than Latin in their day-to-day
>>work would feel. Why should they have to use special tools
>>(having to do syntax analysis so that they can figure out
>>where a % is an escape character and when not,...) just to
>>be able to read the text, just because some tools make too
>>restrictive assumptions?
>
>I totally understand the feeling and agree with it. It's silly to have to 
>enter something in like that. But that's why I have a computer to convert 
>it for me. I already have my computer convert "Aar" to 
>"mailto:me@aaronsw.com" and "D端r" mailto:duerst@w3.org.

[Sorry here, my email client is not up to the job (it thinks everything
is Japanese). My lame excuse is that I'm working on Web i18n, not email
i18n.]

>I don't expect them folks to use any special tools. In fact, requiring 
>Unicode would require me to go and replace a lot of my software with 
>special i18nized tools.

Unicode is already allowed in RDF literals. Why do you say you need
additional tools if it's also allowed in resource identifiers? Do you
think less tools are needed? My guess would be that more tools are
needed, because there are two different forms of representation of
the same characters. Also, how many tools do you think there are
to input/edit/... utf-8? And how many to input/edit/... %hh?
Also, what kind of software are you using? For most of it (APIs,
databases,...), the only thing is that they have to pass through
all 8 bits. That's a lot easier than having to check that they only
have ASCII. So the tools you would need are really not special
internationalized tools, but just tools that don't pretend they
know better than you about your data.


Regards,    Martin.
Received on Monday, 27 May 2002 02:34:07 UTC