Re: XML standard comments from Tim Bray on 1998-04-17 (xml-editor@w3.org from April to June 1998)

From: Tim Bray <tbray@textuality.com>
Date: Thu, 16 Apr 1998 20:55:47 -0700
To: "Richard L. Goerwitz III" <richard@goon.stg.brown.edu>, xml-editor@w3.org
Message-Id: <3.0.32.19980416182039.00740914@pop.intergate.bc.ca>
At 10:39 AM 15/04/98 -0400, Richard L. Goerwitz III wrote:
>  Why is [#x10000-#x10FFFF] specified here?  It's not Unicode.  In
>  fact, it's representable only via UCS-4 and UTF-16 (in the latter
>  case, with two-byte sequences).  Ditto for \xac00-\xd7a3 in
>  BaseChars (B.).

Uh, I believe it is Unicode; this is the set of characters representable
in UTF-16 via surrogates.  I don't have the standard here in Australia.
Are we missing something... Michael?

>  If you're going to go Unicode, why are you defining spaces only in
>  the ASCII range?  What about 2000-200F?

Oh we beat this to death.  I and several other round-eye types really 
wanted to include IDEOGRAPHIC SPACE among other things in the nonterminal
S.  We encountered ferocious resistence from the Japanese, who assert 
repeatedly and assiduously that this is *not* really a space... they
eventually won the argument.

>  Why in heaven's name have Name and Nmtoken been defined and used in
>  such a way that a lexical analyzer can't determine which is which?
...
>  Note that I have already sent off to Michael a suggestion that the
>  strings
>
>   PUBLIC
>   SYSTEM
...
>  This will simplify tokenizing, and will make XML files themselves
>  clearer (seeing as the function of these keywords will become
>  static).

I can only say that I disagree.  None of the authors of parsers to date
have seemed to have difficulty either with parsing Names/NMTokens, or
with disambiguating keywords... what would be the benefit of 
ruling out <CDATA /> or <ATTLIST />?  Doing this seems like namespace
piracy and with a distinct cost to end-users. 

>  What exactly are all those whitespace tokens doing in the syntax
>  spec for DOCTYPE declarations 

They show where white space is required/allowed to occur.  They feel 
necessary to me, we have to convey that information *somehow*.  I feel 
stupid here, I think I'm totally failing to get your point.

>  Also, for the extsubset production: It's a hanging rule.  It has no
>  parent.  Not terribly helpful for implementors.  They have to know,
>  telepathically, that in fact some of the external entities in the
>  DOCTYPE declaration refer to text streams that must be read in and
>  then parsed according to this production.

Not telepathically; it is made clear in the prose.  Yes, it would have
been better if the editors had been smart enough to cram *all* the syntax
of XML into the grammar, but we weren't.

>3.3.1
>
>  Rule 58 is malformed.  It appears to be missing a trailing ')' token.

I just read it, and the parentheses seem to match up:


[58] NotationType ::= 'NOTATION' S '(' S?
                      Name (S? '|' S?
                      Name)* S? ')'

>4.3.3
>
>  You use the phrase, "In the absence of information provided by an exter-
>  nal transport protocol...."  This means that we can override informa-
>  tion contained (or not contained) in the XML file itself via some ex-
>  ternal mechanism.

The the goal is that if a web server trancodes a resource, thus rendering
the declaration wrong, but sends along a header so that the client is
able to figure out the correct encoding, the client is not allowed to
throw the document on the floor because the encoding decl has been made
incorrect, penalizing the luckless end user for the sins of the
web server.

>Appendix B
>
>  - There is a typo in the Digit spec:  #x0BE7-#x0BEF should be #x0BE6-#x0BEF

Don't have Unicode on the road with me... what's the problem? -Tim
Received on Friday, 17 April 1998 00:01:27 UTC