- From: <lee@sq.com>
- Date: Sat, 16 Nov 96 00:27:24 EST
- To: gtn@ebt.com, w3c-sgml-wg@w3.org
Some comments on Gavins' comments (sorry) and some new comments. > Production 1. > This group does not appear complete. > Character classes are best defined elsewhere anyway. Perhaps the best way to do this would be to use the Posix regexp notation and say: S ::= [:space:] and then in the character class section define exactly which code points map to space. An even better way would be to remove S entirely, and to explain that the following sequences are self-delimiting and are thus recognised whether or not surrounded by spaces: [:space:] <? <!-- (in XML this starts a comment) <! </ < & > the remaining tokens being determined by their respective modes -- for example, a string starts with " and continues to a matching " irrespective of whitespace. The productions would then all become much simpler, as they would be in terms of a sequence of tokens. I can't check the ISO 10646 code points here, but assume several people have done so. I am not sure I follow Literal data is any quoted string containing neither a left ankle bracket nor the quotation mark used as a delimiter Are you forbidding an unquoted < within an attribute value? The requirement for a root seems to preclude forests. We've found forests to be very useful, especially in the Canadian Winter :-) In 2.4, Most processors will require the more complex grammar [...] I think it might be more helpful to say why --- e.g.: Any non-trivial application is likely to require... In 2.5 For compatibility is meaningless unless you say with _what_... I realise the SGML world is in fact quite ashamed of the SGML syntax (even though the ideas are good), but I think you should say For compatibility with SGML if that's what you mean. If you want compatibility with the majority of HTML browsers in use today (and probably for the next year or two), you would also need to forbid > within a comment. 2.6 PI target... notation You need a cross reference to the definition of "notation". I think someone else already asked what "normally" means. I take it to mean "in all cases except for the string "XML". The application to which it belongs to which what belongs? The PI or the target or the notation? 2.7 CDATA sections should probably be called CDATA Marked Sections, so that other kinds of marked section can be introduced in future versions of XML, should it be so desired. I particularly want ignored TMP NDATA mrked sections :-) > Section 2.8: > This section is really quite distasteful. I agree. The statement In elment content, all white space (S) is ignored seems a little odd to me! <P>This is odd</P> <P>Thisisodd</P> are the same? Or is "element content" being used in the SGML sense? If so, it must be defined before being used, and I would strongly urge the use of italics or some other indication that "element content" does not mean "element content" but means "element context" (so to speak). In COLLAPSE mode, <P><!--* this is a comment *--> <E> is the same as <P> <E> but if there is an SGML parser, we get <P><E> (right?) I don't think this will help interoperability. All white space should be retained at the parser level in XML, at least ouside of a DTD. Inside a DTD I'd really hate it if a parser included the S nonterminal in parse trees! The XML _document type declaration_ may include a pointer... I think "pointer" is misleading here. You don't mean a machine address, for example, but rather some kind of logical pointer, and should say so. I agree with Gavin that the PI hack sucks. I can't accept that it is better than a fixed outermost element of XML with attributes, and I don't accept that it is better than MIME headers. It is not compatible with SGML or SGML tools, even though it is in some semse legal SGML: it will not in itself allow an XML file to be read by an SGML application, as the SGML application won't know how to switch character sets based on the PI. [34] RMDecl default value of ALL... if neither internal nor external subset no "visible" effect: what invisible effect is occurring? It would be far simpler to say that if there is no subset and no RMDecl, it defaults to NONE, but that if a subset is given, it defaults to INTERNAL if an internal subset only is given, and ALL otherwise. It would be a good idea to have a value that meant parse the internal subset if there is one parse the external subset if there is one if not, default to NONE as this would significantly ease document maintenance, I think. The spec is nearly 30 pages, by the way -- time to simplify it! > Production 38. The stuff about HTML really > belongs in an appendix "Interoperability with > HTML", possibly containing the variant HTML DTD's. Agreed. This section is greatly improved, by the way... :-) 3.2 Well, it's late & I'm hungry, so I have to go and forage for restaurants :-) More later. Hang on -- > Section 4.2.2 Seems a shame to limit SYSTEM ID's > to URL's. The FSI backwayd compatability note > seemed enough to allow them... I don't understand this comment. seemed enough to allow what? URLs _are_ allowed. It's a really bad idea to prefix them with <URL>, as that way you can't treat the same file as containing filenames and as containing URLs. If SGML used only the same syntax everywhere, so that FSIs were attributes on elements, we could use arch forms! Lee
Received on Saturday, 16 November 1996 00:27:40 UTC