- From: Tim Bray <tbray@textuality.com>
- Date: Thu, 16 Apr 1998 20:55:47 -0700
- To: "Richard L. Goerwitz III" <richard@goon.stg.brown.edu>, xml-editor@w3.org
At 10:39 AM 15/04/98 -0400, Richard L. Goerwitz III wrote: > Why is [#x10000-#x10FFFF] specified here? It's not Unicode. In > fact, it's representable only via UCS-4 and UTF-16 (in the latter > case, with two-byte sequences). Ditto for \xac00-\xd7a3 in > BaseChars (B.). Uh, I believe it is Unicode; this is the set of characters representable in UTF-16 via surrogates. I don't have the standard here in Australia. Are we missing something... Michael? > If you're going to go Unicode, why are you defining spaces only in > the ASCII range? What about 2000-200F? Oh we beat this to death. I and several other round-eye types really wanted to include IDEOGRAPHIC SPACE among other things in the nonterminal S. We encountered ferocious resistence from the Japanese, who assert repeatedly and assiduously that this is *not* really a space... they eventually won the argument. > Why in heaven's name have Name and Nmtoken been defined and used in > such a way that a lexical analyzer can't determine which is which? ... > Note that I have already sent off to Michael a suggestion that the > strings > > PUBLIC > SYSTEM ... > This will simplify tokenizing, and will make XML files themselves > clearer (seeing as the function of these keywords will become > static). I can only say that I disagree. None of the authors of parsers to date have seemed to have difficulty either with parsing Names/NMTokens, or with disambiguating keywords... what would be the benefit of ruling out <CDATA /> or <ATTLIST />? Doing this seems like namespace piracy and with a distinct cost to end-users. > What exactly are all those whitespace tokens doing in the syntax > spec for DOCTYPE declarations They show where white space is required/allowed to occur. They feel necessary to me, we have to convey that information *somehow*. I feel stupid here, I think I'm totally failing to get your point. > Also, for the extsubset production: It's a hanging rule. It has no > parent. Not terribly helpful for implementors. They have to know, > telepathically, that in fact some of the external entities in the > DOCTYPE declaration refer to text streams that must be read in and > then parsed according to this production. Not telepathically; it is made clear in the prose. Yes, it would have been better if the editors had been smart enough to cram *all* the syntax of XML into the grammar, but we weren't. >3.3.1 > > Rule 58 is malformed. It appears to be missing a trailing ')' token. I just read it, and the parentheses seem to match up: [58] NotationType ::= 'NOTATION' S '(' S? Name (S? '|' S? Name)* S? ')' >4.3.3 > > You use the phrase, "In the absence of information provided by an exter- > nal transport protocol...." This means that we can override informa- > tion contained (or not contained) in the XML file itself via some ex- > ternal mechanism. The the goal is that if a web server trancodes a resource, thus rendering the declaration wrong, but sends along a header so that the client is able to figure out the correct encoding, the client is not allowed to throw the document on the floor because the encoding decl has been made incorrect, penalizing the luckless end user for the sins of the web server. >Appendix B > > - There is a typo in the Digit spec: #x0BE7-#x0BEF should be #x0BE6-#x0BEF Don't have Unicode on the road with me... what's the problem? -Tim
Received on Friday, 17 April 1998 00:01:27 UTC