- From: Martin Duerst <duerst@it.aoyama.ac.jp>
- Date: Mon, 25 Dec 2006 14:21:40 +0900
- To: Michael Everson <everson@evertype.com>, <idna-update@alvestrand.no>
- Cc: public-iri@w3.org
At 23:00 06/12/24, Michael Everson wrote: >At 22:00 +0900 2006-12-24, Martin Duerst wrote: >It is Kurdish, and the two letters are for other functional reasons being proposed for addition to the standard. So for the sake of argument, assume that this particular reason does not apply. > >Why then would mixing Latin and Greek and Cyrillic at (at least) the same level not be disallowed in IDNs and IRIs to avoid security problems? For IDNs, we are discussing this here, and even if it looks like currently the tendency is to not do this at the protocol level, I'm rather sure that registries and browsers will do something about it. For IRIs, the situation is completely different. IRIs (same as URIs) are 'meta-syntax', a system that allows to encompass all kinds of different syntactic conventions. There are extremely few things you can actually check in an IRI as such. If you know the scheme (such as http:, ftp:, mailto:,...), there are scheme-specific rules that can be used for checking, but you can never assume that a scheme is known everywhere, and implementing all these checks would be expensive, and is better delegated to resolution, where knowledge of the scheme is required anyway. Also, in a 'typical' (e.g. http: or ftp:) IRI, the place where attacks can take place is the domain name. Anything else is just between the server and the client. As an example, assume that you create a font that makes distinction between Latin and Cyrillic very easy, and you create a Web page for it at http://www.evertype.com/fonts/latinCYRILLIC.html (where the 'CYRILLIC' part is actually in Cyrillic). Because it's your Web server, nobody will be able to spoof you, and nobody should be able to tell you whether this particular page name is a good idea or not (well, your customers may tell you it's difficult to type, anyway). Going one step further, one important part of (URIs and) IRIs are query parts. You wouldn't want to prohibit users to submit queries containing keywords in different scripts, or would you? Take a look at the following query (URI): http://www.google.com/search?q=russian+%D1%80%D1%83%D1%81%D1%81%D0%BA%D0%B9%D0%B8 (or the following, the same as above but as an IRI: http://www.google.com/search?q=russian+русскйи). Regards, Martin. #-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University #-#-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp
Received on Monday, 25 December 2006 05:42:34 UTC