- From: Martin Gudgin <mgudgin@microsoft.com>
- Date: Sun, 25 Aug 2002 18:02:45 -0700
- To: "Addison Phillips [wM]" <aphillips@webmethods.com>, <asirv@webmethods.com>
- Cc: "W3C Public Archive" <www-archive@w3.org>, "Jean-Jacques Moreau" <moreau@crf.canon.fr>, "Marc Hadley" <marc.hadley@sun.com>, "Nilo Mitra" <EUSNILM@am1.ericsson.se>, "Noah Mendelson" <noah_mendelsohn@us.ibm.com>, "Henrik Frystyk Nielsen" <henrikn@microsoft.com>
Addison,
Thanks very much for your detailed comments. I've commented inline
Martin
> -----Original Message-----
> From: Addison Phillips [wM] [mailto:aphillips@webmethods.com]
> Sent: 24 August 2002 00:05
> To: Martin Gudgin; asirv@webmethods.com
> Cc: W3C Public Archive; Jean-Jacques Moreau; Marc Hadley;
> Nilo Mitra; Noah Mendelson; Henrik Frystyk Nielsen
> Subject: RE: Algorithm for mapping an application defined
> name to an XML name
>
>
> Hi Martin,
>
> Thanks for the note. It's been awhile since I thought about this.
Sorry it's taken us so long to incorporate your feedback.
>
> My edits were done from the original proposal. Although I
> modified the text to be more correct about various Unicode
> issues, I didn't change the structure of the original at all.
> (FWIW, I would have designed and written it differently. And
> I hate standards that obfuscate what's going on as much as
> this one does. It not being my document, I didn't rewrite it.
> I just edited the text to be more correct.)
If you have ( or have time to produce ) a more readable version, I'm
sure the editorial team would be very grateful.
>
> The TAG item is confusing. Therefore I think the use of the
> word "TAG" in production rule 2 should be replaced with "T".
> I would rewrite #2 slightly differently than Martin does to be:
>
> "2. Let <i>T</i> be a name in an application, where the name
> <i>T</i> is a sequence of characters in the character set,
> encoding, and namespace of the application." // note
> addition of the encoding here
I agree about TAG vs T. I'm not sure how namespace plays here.
>
> For Martin Durst's comments, I note that he appears to be
> commenting on the original draft. My comments on his comments are:
>
> 1. Yes, it needs some explanatory text.
I'll add some :-)
> 2. I didn't change "Prefix be computed, etc." as it was
> beyond the scope of my review. If it is looked up instead of
> being computed, you could change it to say that. Basically,
> the appendix is saying: "get your prefix somehow, this stuff
> deals with the 'local part'"
I've changed 'computed' to 'determined'
> 3. The left-to-right thing was
> fixed by using most-to-least significant (e.g. memory order).
Agreed.
> 4. Already commented on TAG. By all means change that. 5. You
> could change M to UNI if you want. I kept the original
> notation here. Doesn't matter which you choose.
Agreed. I'm sticking with M.
> 6. Yes, the
> note should be added. I missed that on my edit. Note that
> Martin's point about missing characters is dealt with (but
> not very clearly) by 5.2.
OK, I've added that in.
> 7. The various edits, etc., have to
> do with changing the structure of the document as Martin
> suggests. I'm not wild about the structure either, as
> indicated. You could follow his edits which do not change the
> end result.
OK, I think I'm happier that your work hasn't changed anything.
> 8. Say explicitly that hex digits always appear
> as uppercase.
I think the BNF for hexDigits already does this.
> 9. Add examples if you so desire.
I've added some examples ( although non outside the BMP )
>
> For Mike Champion's comments, I note that he also appears to
> be commenting on the draft I revised???:
>
> 1. Referencing UCS-4, an obsolete encoding of Unicode, is one
> of the things I changed. I used Unicode Scalar Value (that
> is, code points) to get away from the vagaries of the
> different encodings. Although UTF-32/UCS-4 are essentially
> the scalar value, there is no need to get into things like
> Big/LittleEndianness and other stuff.
OK, we'll stick with USV.
>
> I should note that there are non-characters inside the
> 0..10FFFF range. Saying USVs avoid those without a lot of
> text to explain it, so long as you look up the precise
> Unicode definition of all this stuff (for example in CharMod).
OK, we'll reference the appropriate spec. Would that be CharMod? Or one
of the Unicode specs?
>
> 2. The use of U+ notation is slightly confusing in the text.
> I would change this sequence (at 5.4.1.1.a):
>
> "Let U1, U2, ... , U8 be the eight hex digits [PROD: 11] such
> that Ci is "U+" U1 U2 ... U8 in the Unicode Scalar Value"
>
> to be:
>
> "Let U1, U2, ..., U8 be the eight hex digits [PROD: 11] in
> the 32-bit Unicode Scalar Value of Ci. For example, a
> character whose scalar value is
> U+10FFFA would be represented by the sequence U1=0 U2=0 U3=1
> U4=0 U5=F
> U+U6=F
> U7=F U8=A ('0010FFFA')"
>
> Note that this makes clear that the encoding is wasteful. The
> first two bytes will never be used for a value other than "00".
>
> Good luck with your editing.
Thanks, and thank-you for all your input so far,
Regards
Martin Gudgin
Received on Sunday, 25 August 2002 21:03:17 UTC