Unicode Normalization

On Mon, 02 Feb 2009 22:04:47 +0100, Phillips, Addison <addison@amazon.com>  
wrote:
> The question here is one of interpretation. Anne points out that, at  
> least theoretically, it is possible to create XML document schemas that  
> define two semantically identical names that are encoded using different  
> code point sequences. This, of course, is an Extremely Bad Idea, since,  
> among other things, such a document might not live through a transcoding  
> to another character encoding or other forms of processing. Although  
> Anne pointed to XML 1.1, in fact, XML 1.0 5e also includes the same  
> recommendations:
>
>   http://www.w3.org/TR/xml/#sec-suggested-names
>
> The real question is: what feature is more important to preserve? The  
> non-normalizability of XML names (which is deprecated anyway)?

I never pointed to XML 1.1. I did point out that the above section was  
non-normative and for some reason had a normative reference to Unicode  
Normalization, which seems like a bug.

I don't really care whether it's a bad idea or not, it would a bug in our  
software if we normalized on input unless XML was somehow changed.


And I'll try to make my other point a bit more explicit, I do not think  
that www-style is the appropriate venue for this discussion. If we cannot  
do normalization on XML or HTML, doing normalization (say NFC) on CSS  
would make it not work with certain XML documents that e.g. use NFD.  
(Doing the normalization during comparison is not really going to fly I  
think.)

If changes are required here we need to change CSS, but also HTML and XML.  
(And maybe ECMAScript; I do not know the details.)


-- 
Anne van Kesteren
<http://annevankesteren.nl/>
<http://www.opera.com/>

Received on Tuesday, 3 February 2009 09:23:10 UTC