W3C home > Mailing lists > Public > www-archive@w3.org > August 2002

RE: Algorithm for mapping an application defined name to an XML name

From: Martin Gudgin <mgudgin@microsoft.com>
Date: Sun, 25 Aug 2002 18:02:45 -0700
Message-ID: <92456F6B84D1324C943905BEEAE0278E02681C90@RED-MSG-10.redmond.corp.microsoft.com>
To: "Addison Phillips [wM]" <aphillips@webmethods.com>, <asirv@webmethods.com>
Cc: "W3C Public Archive" <www-archive@w3.org>, "Jean-Jacques Moreau" <moreau@crf.canon.fr>, "Marc Hadley" <marc.hadley@sun.com>, "Nilo Mitra" <EUSNILM@am1.ericsson.se>, "Noah Mendelson" <noah_mendelsohn@us.ibm.com>, "Henrik Frystyk Nielsen" <henrikn@microsoft.com>


Thanks very much for your detailed comments. I've commented inline


> -----Original Message-----
> From: Addison Phillips [wM] [mailto:aphillips@webmethods.com]
> Sent: 24 August 2002 00:05
> To: Martin Gudgin; asirv@webmethods.com
> Cc: W3C Public Archive; Jean-Jacques Moreau; Marc Hadley; 
> Nilo Mitra; Noah Mendelson; Henrik Frystyk Nielsen
> Subject: RE: Algorithm for mapping an application defined 
> name to an XML name
> Hi Martin,
> Thanks for the note. It's been awhile since I thought about this.

Sorry it's taken us so long to incorporate your feedback.

> My edits were done from the original proposal. Although I
> modified the text to be more correct about various Unicode 
> issues, I didn't change the structure of the original at all. 
> (FWIW, I would have designed and written it differently. And 
> I hate standards that obfuscate what's going on as much as 
> this one does. It not being my document, I didn't rewrite it. 
> I just edited the text to be more correct.)

If you have ( or have time to produce ) a more readable version, I'm
sure the editorial team would be very grateful.

> The TAG item is confusing. Therefore I think the use of the
> word "TAG" in production rule 2 should be replaced with "T". 
> I would rewrite #2 slightly differently than Martin does to be:
> "2. Let <i>T</i> be a name in an application, where the name
> <i>T</i> is a sequence of characters in the character set, 
> encoding, and namespace of the application."  // note 
> addition of the encoding here

I agree about TAG vs T. I'm not sure how namespace plays here.

> For Martin Durst's comments, I note that he appears to be
> commenting on the original draft. My comments on his comments are:
> 1. Yes, it needs some explanatory text.

I'll add some :-)

> 2. I didn't change "Prefix be computed, etc." as it was
> beyond the scope of my review. If it is looked up instead of 
> being computed, you could change it to say that. Basically, 
> the appendix is saying: "get your prefix somehow, this stuff 
> deals with the 'local part'" 

I've changed 'computed' to 'determined' 

> 3. The left-to-right thing was
> fixed by using most-to-least significant (e.g. memory order). 


> 4. Already commented on TAG. By all means change that. 5. You
> could change M to UNI if you want. I kept the original 
> notation here. Doesn't matter which you choose. 

Agreed. I'm sticking with M.

> 6. Yes, the
> note should be added. I missed that on my edit. Note that 
> Martin's point about missing characters is dealt with (but 
> not very clearly) by 5.2. 

OK, I've added that in.

> 7. The various edits, etc., have to 
> do with changing the structure of the document as Martin 
> suggests. I'm not wild about the structure either, as 
> indicated. You could follow his edits which do not change the 
> end result. 

OK, I think I'm happier that your work hasn't changed anything.

> 8. Say explicitly that hex digits always appear 
> as uppercase. 

I think the BNF for hexDigits already does this.

> 9. Add examples if you so desire.

I've added some examples ( although non outside the BMP )

> For Mike Champion's comments, I note that he also appears to 
> be commenting on the draft I revised???:
> 1. Referencing UCS-4, an obsolete encoding of Unicode, is one 
> of the things I changed. I used Unicode Scalar Value (that 
> is, code points) to get away from the vagaries of the 
> different encodings. Although UTF-32/UCS-4 are essentially 
> the scalar value, there is no need to get into things like 
> Big/LittleEndianness and other stuff.

OK, we'll stick with USV.

> I should note that there are non-characters inside the 
> 0..10FFFF range. Saying USVs avoid those without a lot of 
> text to explain it, so long as you look up the precise 
> Unicode definition of all this stuff (for example in CharMod).

OK, we'll reference the appropriate spec. Would that be CharMod? Or one
of the Unicode specs?

> 2. The use of U+ notation is slightly confusing in the text. 
> I would change this sequence (at
> "Let U1, U2, ... , U8 be the eight hex digits [PROD: 11] such 
> that Ci is "U+" U1 U2 ... U8 in the Unicode Scalar Value"
> to be:
> "Let U1, U2, ..., U8 be the eight hex digits [PROD: 11] in 
> the 32-bit Unicode Scalar Value of Ci. For example, a 
> character whose scalar value is
> U+10FFFA would be represented by the sequence U1=0 U2=0 U3=1 
> U4=0 U5=F 
> U+U6=F
> U7=F U8=A ('0010FFFA')"
> Note that this makes clear that the encoding is wasteful. The 
> first two bytes will never be used for a value other than "00".
> Good luck with your editing.

Thanks, and thank-you for all your input so far,


Martin Gudgin
Received on Sunday, 25 August 2002 21:03:17 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:31:52 UTC