- From: Anne van Kesteren <annevk@opera.com>
- Date: Sun, 29 Mar 2009 15:06:35 +0200
On Sun, 29 Mar 2009 15:01:51 +0200, Giovanni Campagna <scampa.giovanni at gmail.com> wrote: > 2009/3/29 Anne van Kesteren <annevk at opera.com>: >> I'm not sure if you're correct about those differences, but even if you >> are they are not the only differences. E.g. LEIRIs perform >> normalization if the input encoding is non-Unicode. URLs do not. URLs >> can encode their query >> component per the input encoding (and do so for HTML and some APIs). >> LEIRIs cannot. > > What is the problem with normalization? Is there a standard for > conversion to non-Unicode to Unicode? > I guess no, so normalization (which should always be done) is perfectly > legal. It's about Unicode Normalization. (And it should not always be done.) > In addition, IRIs are defined as a sequence of Unicode codepoints. It > does not matter how those codepoints are stored (ASCII, ISO-8859-1, > UTF-8), only the Unicode version of them. Please read the IRI specification again. Specifically section 3.1. > This is the same as URL5s, by the way, because none of them is defined > on octets and both use the RFC3986 method for percent-encoding (using > UTF-8) No, it's not always using UTF-8. -- Anne van Kesteren http://annevankesteren.nl/
Received on Sunday, 29 March 2009 06:06:35 UTC