- From: Martin Gudgin <mgudgin@microsoft.com>
- Date: Sun, 25 Aug 2002 18:02:45 -0700
- To: "Addison Phillips [wM]" <aphillips@webmethods.com>, <asirv@webmethods.com>
- Cc: "W3C Public Archive" <www-archive@w3.org>, "Jean-Jacques Moreau" <moreau@crf.canon.fr>, "Marc Hadley" <marc.hadley@sun.com>, "Nilo Mitra" <EUSNILM@am1.ericsson.se>, "Noah Mendelson" <noah_mendelsohn@us.ibm.com>, "Henrik Frystyk Nielsen" <henrikn@microsoft.com>
Addison, Thanks very much for your detailed comments. I've commented inline Martin > -----Original Message----- > From: Addison Phillips [wM] [mailto:aphillips@webmethods.com] > Sent: 24 August 2002 00:05 > To: Martin Gudgin; asirv@webmethods.com > Cc: W3C Public Archive; Jean-Jacques Moreau; Marc Hadley; > Nilo Mitra; Noah Mendelson; Henrik Frystyk Nielsen > Subject: RE: Algorithm for mapping an application defined > name to an XML name > > > Hi Martin, > > Thanks for the note. It's been awhile since I thought about this. Sorry it's taken us so long to incorporate your feedback. > > My edits were done from the original proposal. Although I > modified the text to be more correct about various Unicode > issues, I didn't change the structure of the original at all. > (FWIW, I would have designed and written it differently. And > I hate standards that obfuscate what's going on as much as > this one does. It not being my document, I didn't rewrite it. > I just edited the text to be more correct.) If you have ( or have time to produce ) a more readable version, I'm sure the editorial team would be very grateful. > > The TAG item is confusing. Therefore I think the use of the > word "TAG" in production rule 2 should be replaced with "T". > I would rewrite #2 slightly differently than Martin does to be: > > "2. Let <i>T</i> be a name in an application, where the name > <i>T</i> is a sequence of characters in the character set, > encoding, and namespace of the application." // note > addition of the encoding here I agree about TAG vs T. I'm not sure how namespace plays here. > > For Martin Durst's comments, I note that he appears to be > commenting on the original draft. My comments on his comments are: > > 1. Yes, it needs some explanatory text. I'll add some :-) > 2. I didn't change "Prefix be computed, etc." as it was > beyond the scope of my review. If it is looked up instead of > being computed, you could change it to say that. Basically, > the appendix is saying: "get your prefix somehow, this stuff > deals with the 'local part'" I've changed 'computed' to 'determined' > 3. The left-to-right thing was > fixed by using most-to-least significant (e.g. memory order). Agreed. > 4. Already commented on TAG. By all means change that. 5. You > could change M to UNI if you want. I kept the original > notation here. Doesn't matter which you choose. Agreed. I'm sticking with M. > 6. Yes, the > note should be added. I missed that on my edit. Note that > Martin's point about missing characters is dealt with (but > not very clearly) by 5.2. OK, I've added that in. > 7. The various edits, etc., have to > do with changing the structure of the document as Martin > suggests. I'm not wild about the structure either, as > indicated. You could follow his edits which do not change the > end result. OK, I think I'm happier that your work hasn't changed anything. > 8. Say explicitly that hex digits always appear > as uppercase. I think the BNF for hexDigits already does this. > 9. Add examples if you so desire. I've added some examples ( although non outside the BMP ) > > For Mike Champion's comments, I note that he also appears to > be commenting on the draft I revised???: > > 1. Referencing UCS-4, an obsolete encoding of Unicode, is one > of the things I changed. I used Unicode Scalar Value (that > is, code points) to get away from the vagaries of the > different encodings. Although UTF-32/UCS-4 are essentially > the scalar value, there is no need to get into things like > Big/LittleEndianness and other stuff. OK, we'll stick with USV. > > I should note that there are non-characters inside the > 0..10FFFF range. Saying USVs avoid those without a lot of > text to explain it, so long as you look up the precise > Unicode definition of all this stuff (for example in CharMod). OK, we'll reference the appropriate spec. Would that be CharMod? Or one of the Unicode specs? > > 2. The use of U+ notation is slightly confusing in the text. > I would change this sequence (at 5.4.1.1.a): > > "Let U1, U2, ... , U8 be the eight hex digits [PROD: 11] such > that Ci is "U+" U1 U2 ... U8 in the Unicode Scalar Value" > > to be: > > "Let U1, U2, ..., U8 be the eight hex digits [PROD: 11] in > the 32-bit Unicode Scalar Value of Ci. For example, a > character whose scalar value is > U+10FFFA would be represented by the sequence U1=0 U2=0 U3=1 > U4=0 U5=F > U+U6=F > U7=F U8=A ('0010FFFA')" > > Note that this makes clear that the encoding is wasteful. The > first two bytes will never be used for a value other than "00". > > Good luck with your editing. Thanks, and thank-you for all your input so far, Regards Martin Gudgin
Received on Sunday, 25 August 2002 21:03:17 UTC