- From: Robert J Burns <rob@robburns.com>
- Date: Tue, 3 Feb 2009 12:38:32 -0600
- To: Aryeh Gregor <Simetrical+w3c@gmail.com>
- Cc: public-i18n-core@w3.org, jonathan@jfkew.plus.com, W3C Style List <www-style@w3.org>
Hi Aryeh, On Feb 3, 2009, at 8:53 AM, Aryeh Gregor wrote: > On Tue, Feb 3, 2009 at 5:04 AM, Robert J Burns <rob@robburns.com> > wrote: >> The problem with this is that there would have to be a prior >> agreement so >> that a Unicode processing application could count on everything >> received >> already as NFC and that's simply not the case. If a Unicode UA is >> incapable >> of processing NFD (which also implies it cannot process NFC >> characters that >> are combining characters) then it would be up to that application >> to convert >> internally to something it could handle (just what conversion it >> would do, I >> don't know). > > Who's talking about a Unicode UA being unable to process NFD? Henri raised this issue right before the fragment you quote from me. There Henri says: >>> The central reason for using NFC for interchange (i.e. what goes >>> over >>> HTTP) is that legacy software (including the text rendering code in >>> legacy browsers) works better with NFC. >>> To me that implies Henri thinks we need to promote NFC to help legacy software that cannot process combining characters. But that forgets that even NFC has combining characters. > The question on the table seems to be whether UAs should normalize all > input to NFC when they parse it. This would permit them to process > NFC, NFD, or any other normalized or non-normalized input. They would > then probably end up sending responses like form data in NFC even if > they received the original input in NFD. If the server prefers to use > NFD internally, it's up to the server to then convert back to NFD on > its end. Yes, that's precisely what my messages were arguing for[1]. This needs to be done at the parser level and XML's dependence on Unicode implies it should probably already be happening in XML parsers now (i.e., it is an implementation error to no canonically normalize one way or the other for string comparisons) > We aren't really talking about transmission formats here, AFAICT, or > at least that wasn't the original question. The question is whether > it's acceptable for browsers to internally normalize all input somehow > (to NFC, NFD, whatever) as soon as it's received, so that they can > ensure that they make correct comparisons according to the Unicode > standard. This is relevant to CSS because it seems to be the best way > of ensuring that CSS comparisons aren't normalization-sensitive. I agree. > I'm not clear on what exactly the objections are to that, other than > possibly violating the XML standard (it would be surprising to me if > that did violate XML). Quite the opposite I think it violates the XML standard to not compare canonically equivalent strings and determine they are equivalent. > The only practical objection I can see is that > some sites might be broken and not do normalization themselves. You > could have something like user registers with a name in NFD (or > entirely unnormalized) in non-normalizing browser -> site saves to > database -> same user tries to log in later in a normalizing browser > -> login fails because site thinks the names are different. I don't > know whether this would be a problem in practice. Which by fixing it in the implementations (XML parsers, CSS parsers and otherwise) begins to fix the problem. Anne wrote: > (As far as I can tell XML is Unicode > Normalization agnostic. It merely recommends authors to do a certain > thing. We can certainly recommend authors to do a certain thing in > HTML > and CSS too...) XML is not Unicode agnostic. Unicode is a normative reference in terms of text handling. So an XML UA is by definition also a Unicode UA. That means that an implementation needs to have some reason for comparing two byte-wise unequal though canonically equivalent strings and determining they do not match. I haven't heard anyone here say why an XML processor needs to support (and therefore promote) such errors. Take care, Rob [1]: <http://lists.w3.org/Archives/Public/public-i18n-core/2009JanMar/0120.html >. This only went to the I18N list and not the CSS list.
Received on Tuesday, 3 February 2009 18:39:18 UTC