RE: XML 5e, Unicode Normalization, and CharMod: Your thoughts sought...

Hello Martin,

(chair hat off)

Thanks for taking the time to reply.
> 
> - I suggest to split the first sentence into two. Unicode says...
> However, XML parsed entities...

That's a good idea.

> 
> - Full normalization is the right thing in general, but is not
> always
> appropriate. In particular, we found some cases with SVG and fonts
> where it may not work. But that's okay, because you have a SHOULD.

Okay, but for this specific case (XML parsed elements), isn't it always appropriate? I know the font case doesn't apply here, but I'm curious about the SVG case (I don't recall what it is). If full normalization is not always appropriate for XML document elements, then even the SHOULD here is questionable/untenable.


> 
> - The problem that CharMod_Norm isn't in a stable state is still
> around.

Agreed. What concerns me now is not the explanatory material in CharMod-Norm ("what is normalization", etc.), but the conclusions we make and the requirements we intend to enforce as a result. I think the XML issue will drive us to a conclusion on what those should be and from there we should be able to finalize the document.

> 
> As CharMod_Norm is still being worked on, we can (and will have to)
> adjust that as necessary. However, except for moving from MUST to
> SHOULD, I don't see a conflict. Note that step 1 of C312 says "
> MUST be performed by the producers of the strings to be compared".

If we insert the warning we are proposing into XML, I think we have cast serious doubt on all of our normative language in CharModNorm. I don't think we can even say you "SHOULD" normalize if what you are comparing are XML parsed elements. In fact, we should say "SHOULD NOT", because normalization will break when processing valid XML.

I'm concerned that, effectively, both early and late normalization are dead.

Addison

Received on Friday, 8 May 2009 15:58:39 UTC