- From: Michael Sperberg-McQueen <U35395@UICVM.UIC.EDU>
- Date: Thu, 03 Apr 97 08:38:55 CST
- To: W3C SGML Working Group <w3c-sgml-wg@w3.org>
On Thu, 3 Apr 1997 08:25:59 -0500 Martin Bryan said: >In 1.5, why is production [2] the only one to use the two character >forms of character references (e.g. #x0d) rather than the 4 character >form (e.g. #x00ad) used elsewhere? Because production 2 is also the only production to need hex character references of *more* than four, and it seemed (to me, at least) pointless to normalize 0d to 000d when we could not normalize 7FFFFFFF to a four-digit value. By the definition of the notation, �d and �d (and &#d!) are all synonymous, but if everyone who sees this assumes we must be trying to convey something of occult significance, we might as well normalize to 000d etc. >It should be made clear in 2.7 that there is no way in which you can >enter ]]> in a CDATA section as ]]> will only be recognized outside >of such sections. Isn't this clear enough already? It does follow directly from the explicit statement that only ']]>' is recognized. We cannot hope to list explicitly every consequence of every rule; there needs to be some reliance on having a reader capable of seeing that if only CDEnd is recognized, then Reference is not recognized (particularly since this inference is confirmed by the parenthetical remark about lt and amp). >In 2.8 the second paragraph ends with a hanging sentence, viz: Not in my copy. Are you sure you're not falling victim to the Netscape bug that displays some lines as white space if you scroll by small increments? >In 3.3 the sentence reading: >> At user option, an XML processor may issue a warning if >>attributes are declared for an entity type not itself declared, but >this is not an error. should have "entity type" changed to "element >type". Thank you. My typo. >For 3.4, under what circumstances is SkipLit valid if ignored marked >sections can only contain complete markup declarations? Where does it say that ignored marked sections can contain only complete markup declarations? 8879 doesn't say that, and neither do we -- if we did, that would suggest that a validating parser would have to parse ignored sections completely, which I think we don't want to do. The rule for ignored sections has, I concede, become rather involved as a result of trying to ensure that all conditional sections -- and the DTD -- begin and end at the same locations, regardless of the values of their controlling parameter entities. I think the current rules achieve that end; I don't think any simpler rules do. Thanks to James for working it out (and for providing the example that illustrated the very real danger). >For 4.3.3 shouldn't a statement be added that EncodingPI must be encoded >in UTF-8 0r be proceeded by #xFEFF if encoded in UCS-2? (Allowing it to >be encoded in any other way would give interoperability problems.) I'm not sure I follow. Para 2 of 4.3.3 begins "Entities encoded in UCS-2 must begin with the Byte Order Mark ...", so it seems to me that what you are suggesting for UCS-2 is already required. The EncodingPI itself is *not* required to be encoded in UTF-8; that suggestion was made last fall, and failed to generate consensus. The EncodingPI is written in the same encoding as the rest of the file, because one of the main advantages of in-file headers of this sort is that they can be maintained directly by users, without reliance on anything more elaborate than an editor that understands the encoding in use in the file. Anyone who has struggled for years, as I have, with system administrators too ignorant to understand character set issues and too busy to learn, and too wise to let me 'fix' the system routines myself, will appreciate the importance of letting the data be labeled by users who know what it is, rather than by system routines that don't. Thanks for your corrections. -C. M. Sperberg-McQueen
Received on Thursday, 3 April 1997 10:10:39 UTC