added I18N note to xml 1.0 [was: Minutes for XML Core WG telcon of 2009 June 3] from Grosso, Paul on 2009-06-09 (public-xml-core-wg@w3.org from June 2009)

From: Grosso, Paul <pgrosso@ptc.com>
Date: Tue, 9 Jun 2009 10:06:44 -0400
To: <public-xml-core-wg@w3.org>
Message-ID: <CF83BAA719FD2C439D25CBB1C9D1D3020FE29E0D@HQ-MAIL4.ptcnet.ptc.com>

> -----Original Message-----
> From: public-xml-core-wg-request@w3.org 
> [mailto:public-xml-core-wg-request@w3.org] On Behalf Of Simon Pieters
> Sent: Wednesday, 2009 June 03 12:18
> To: Norman Walsh; public-xml-core-wg@w3.org
> Subject: Re: Minutes for XML Core WG telcon of 2009 June 3
> 
> On Wed, 03 Jun 2009 17:18:14 +0200, Norman Walsh 
> <ndw@nwalsh.com> wrote:
> 
> > Henry thought it was verbose but ok. Liam suggests that the sentence
> >
> > [[
> > A document is still well-formed, even if it is not in a 
> normalized form.
> > ]]
> >
> > should be changed to.
> >
> > [[
> > A document may still be well-formed even if it is not in a 
> normalized  
> > form.
> > ]]
> >
> > With this proposed change, let's put this in countdown.
> 
> Is this intended to be an RFC2119 "MAY"? That doesn't make 
> much sense to  
> me. Maybe "might" is a better word here.

I too question the wisdom of adding a non-rfc2119 may.

Also, just per (what I understand as) standard english language usage, 
we're talking about logical possibilities here, not permissions, so
"may" isn't as appropriate as "can".

Taking the perogative of a chair that missed the last telcon, I suggest 
the following wording is in countdown:

<added-note>
_Unicode_ (rule C06) says that canonically equivalent sequences of
characters ought to be treated as identical. However, XML _parsed
entities_ (including _document entities_) that are canonically
equivalent according to Unicode but which use distinct code point
(character) sequences are considered distinct by XML processors.
Therefore, all XML parsed entities SHOULD be created in a "fully
normalized" form per _[CharMod-Norm]_. Otherwise the user might
unknowingly create canonically equivalent but unequal sequences that
appear identical to the user but which are treated as distinct by XML
processors.

A document can still be well-formed, even if it is not in a normalized
form. XML processors MAY verify that the document being processed is in
a fully-normalized form and report to the application whether it is or
not.
</added-note>

paul

Received on Tuesday, 9 June 2009 14:07:39 UTC