W3C home > Mailing lists > Public > public-iri@w3.org > July 2012

Re: why use IRIs?

From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
Date: Wed, 04 Jul 2012 17:49:19 +0900
Message-ID: <4FF4038F.5050006@it.aoyama.ac.jp>
To: Mark Nottingham <mnot@mnot.net>
CC: public-iri@w3.org
Hello Mark,

On 2012/07/04 15:13, Mark Nottingham wrote:
> I tend to agree with Peter.
> The experience of using IRIs as identifiers in Atom was, IME, a disaster.

Can you be specific? Can you provide pointers?

> Identifiers need to be resistant to spoofing and mistakes.

It's easy to create spoofing identifiers using ASCII/English only.

It's also not too difficult to create spoofing/mistake-resistant 
identifiers in other scripts or languages, for people who are better 
versed in these scripts/languages. This may be difficult to understand 
for "English-centric" people, but it's indeed the case.

> Björn said:
>> How would you like it if URIs could use only 20 of the 26 letters in the
>> english alphabet and you would have to encode, decode and convert them
>> all the time, or use awkward transliterations to avoid having to do so?
> URIs already have a constrained syntax; you can't use certain characters in certain places.

Yes. But not being able to use certain punctuation is different from not 
being able to use characters in the basic alphabet/character repertoire 
of the language. It's easy to replace spaces with hyphens or whatever. 
It's a different thing to replace one letter with another, or just drop it.

> As long as people can put IRIs into HTML and browser address bars, I don't think they'll care.
> Martin said:
>> I think the real motivation would be people looking at HTTP traces and
>> preferring to see Unicode rather than lots of %HH strings. Of course the
>> number of people looking at HTTP traces is low, and they are not end users.
> Is this use case really worth the pain,

For that specific case, I'm not sure. That's why I used "would". But I 
also don't think the pain would be that high.

> inefficiency,

Conversion would indeed cost some cycles. But using raw bytes instead of 
%-encoding would save bytes (which, these days, as far as I have 
followed the SPDY debates so far, seems to be the more important side of 
the tradeoff).

> and very likely security vulnerabilities caused by transcoding from IRIs to URIs and back when hopping from HTTP 2.0 to 1.1 and back? I don't think so.

There are quite a lot of places where security blunders can happen. That 
conversion step wouldn't be the first one and wouldn't be the last one. 
And using %-encoding for basic ASCII characters is already allowed 
today, so the basic security vulnerability (firewalls can't just check 
on character strings) already exists today.

> My English-centric .02; ŸṀṂṼ.

您里可变 (this is not real Chinese, but just four roughly corresponding 
characters put together).

Regards,   Martin.
Received on Wednesday, 4 July 2012 08:49:55 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:39:44 UTC