- From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
- Date: Wed, 04 Jul 2012 17:49:19 +0900
- To: Mark Nottingham <mnot@mnot.net>
- CC: public-iri@w3.org
Hello Mark, On 2012/07/04 15:13, Mark Nottingham wrote: > I tend to agree with Peter. > > The experience of using IRIs as identifiers in Atom was, IME, a disaster. Can you be specific? Can you provide pointers? > Identifiers need to be resistant to spoofing and mistakes. It's easy to create spoofing identifiers using ASCII/English only. It's also not too difficult to create spoofing/mistake-resistant identifiers in other scripts or languages, for people who are better versed in these scripts/languages. This may be difficult to understand for "English-centric" people, but it's indeed the case. > Björn said: > >> How would you like it if URIs could use only 20 of the 26 letters in the >> english alphabet and you would have to encode, decode and convert them >> all the time, or use awkward transliterations to avoid having to do so? > > URIs already have a constrained syntax; you can't use certain characters in certain places. Yes. But not being able to use certain punctuation is different from not being able to use characters in the basic alphabet/character repertoire of the language. It's easy to replace spaces with hyphens or whatever. It's a different thing to replace one letter with another, or just drop it. > As long as people can put IRIs into HTML and browser address bars, I don't think they'll care. > > Martin said: > >> I think the real motivation would be people looking at HTTP traces and >> preferring to see Unicode rather than lots of %HH strings. Of course the >> number of people looking at HTTP traces is low, and they are not end users. > > Is this use case really worth the pain, For that specific case, I'm not sure. That's why I used "would". But I also don't think the pain would be that high. > inefficiency, Conversion would indeed cost some cycles. But using raw bytes instead of %-encoding would save bytes (which, these days, as far as I have followed the SPDY debates so far, seems to be the more important side of the tradeoff). > and very likely security vulnerabilities caused by transcoding from IRIs to URIs and back when hopping from HTTP 2.0 to 1.1 and back? I don't think so. There are quite a lot of places where security blunders can happen. That conversion step wouldn't be the first one and wouldn't be the last one. And using %-encoding for basic ASCII characters is already allowed today, so the basic security vulnerability (firewalls can't just check on character strings) already exists today. > My English-centric .02; ŸṀṂṼ. 您里可变 (this is not real Chinese, but just four roughly corresponding characters put together). Regards, Martin.
Received on Wednesday, 4 July 2012 08:49:55 UTC