- From: Christopher R. Maden <crm@ebt.com>
- Date: Tue, 12 Nov 1996 06:02:32 GMT
- To: w3c-sgml-wg@w3.org
All comments are against the 1 November draft. However, the draft on the W3C Web server and the one on Textuality are both identified as version 0.01, 1 November, yet are different. When not identified, comments are believed applicable to both, but have only been checked against (and cut&pasted from) the Textuality version. Header: Pointer to "this version" is invalid. Clause 1.1, W3C version only: Says that this is only for the SGML ERB, but is publicly available. It looks like a "leak". Clause 1.5, production group 4: QuotedNames ::= '"' Names '"' | "'" Names "'" The non-terminal Names is never defined. Clause 2.2, first paragraph: A character is A character is an atomic... Typo. Clause 2.3, production group 6: | '<!DOCTYPE' (Name | S)+ ('[' [^]]* ']')? '>' /* doc type declaration */ The beginning of the production component allows a jumble, and the end does not allow space between DSC and MDC. If the purpose is to simply allow recognition and skipping of the doctype declaration, then '<!DOCTYPE' [^>]* ('[' [^]]* ']' S?)? '>' should suffice; if more restrictive syntax is warranted, then something like the doctypedecl production (in 2.8) should be invoked. Clause 2.3, last paragraph, W3C: the single-quote character (') may be represented as "&sq;", and the double-quote character (") as "&dq;". Clause 2.3, last paragraph, Textuality: the single-quote character (') may be represented as "&sqot;", and the double-quote character (") as """. Please synchronize the W3C version and Textuality's, and PLEASE change version numbers when something as substantial as this changes. Both versions are identified as 0.01. I far prefer the Textuality version here. Clause 2.4, first paragraph: Comments may appear anywhere that character data may, except in a marked section (more properly, comments appearing in a marked section will not be recognized as such). Comments may appear in element content and in the prolog, as well, no? In other words, "Comments may appear anywhere, except in a marked section; i.e., within element content, in mixed content, or in a document type declaration subset (see doctypedecl)." Clause 2.6, and in general: Wherever using a term important to ISO 8879 in a different manner from 8879, the term 8879 uses for the concept should be given for reference. In this case, the term "marked section" in XML refers to what 8879 calls "CDATA marked section". This should be made clear in a note; as 8879 is referenced, some non-trivial portion of implementers will make reference to it, and different terminology may confuse them. Clause 2.7: I still think that making whitespace handling vary by an attribute value is a big mistake. This is something that should be handled by a stylesheet. If all whitespace, everywhere, is preserved by the XML processor, the application is guaranteed not to lose any it wants, and can toss anything it doesn't. For most formatting, some whitespace will need to be collapsed and others kept, so a renderer would need this ability anyway. Adding it to the processor requirement seems silly. Clause 2.7, example: <!ATTLIST * -XML-SPACE (PRESERVE|COLLAPSE) #IMPLIED> Does this mean '*' is allowable at that point in an ATTLIST declaration? It should be made clear in the surrounding text. (Given that attlist declarations are not cumulative, this attribute will need to be definied for every element on which it will be used, even if just to default it. Seems a bit silly.) Clause 2.8, production group 12: The placement of the production group breaks up the flow of text; the paragraph after refers to "these two subsets", and I was very confused as to *which* two until I realized that they had been referenced in the paragraph prior to the production group. Move the group down a paragraph, just before the example, maybe. Clause 2.9, third paragraph: 1. attributes with default values, and elements to which these attributes apply appear in the document,.. I think a more applicable phrasing is, "attributes with default values, and elements to which these attributes apply *and are not explicitly set* appear in the document..." though this may be too complex to easily check. Clause 2.9, last paragraph: If no RMD is provided, the effect is identical to an RMD with the value ALL. I feel that NONE should be the default. The simplest XML document should not require the RMD at all. Clause 3.1, second text paragraph: The Name in the start- and end-tag rules gives the element's type. Strike "rules", or reword this. "The Name referred to in the ..." or "The Name in the ... -tags gives...". Ibid: The Name-QuotedCData pairs are referred to as the attributes of the element,... Everyone here is aware that this is the attribute value specification, but we use the terms interchangeably. We must NOT do this in the XML spec; it caused endless headaches when Netscape started to handle entity refs in attribute value *specifications*. The discussions about when to use & and when to use %24 in <a href="..."> went for far too long on www-html, html-wg, and lynx-dev. Care must be taken in XML to use the correct terms "attribute value specification" and "attribute value" as appropriate. Even though entity references are not allowed in AVSs in XML 1.0, lack of confusion now will make going forward easier. Clause 3.1, production group 17: content ::= (element | PCDATA | MS | PI | Comment)* There should be [ VC: Content model ] after that; i.e., the content of an element will match the content model in the DTD if the document is valid. Clause 3.2, second list: More simply stated, the elements, delineated by start- and end-tags, nest within each other properly. Either strike "properly" or define it. Nesting makes sense, I think, to the target non-SGML-aware audience; adding "properly" implies that there's something special that's not being said. Clause 3.5: Where did it go? It's in the W3C version, but not on Textuality. Has the DTD summary been done away with? It's no longer needed for empty elements, and is moot for mixed vs. element content distinction, but would be a VERY useful way to override the defaulted entities without requiring DTD parsing. The receiving non-DTD-speaking application could say, "This ∏ here does something other than a big ol' pi, but I don't know what...". Clause 4.2, productions: EntityDecl ::= '<!ENTITY' S Name EntityDef S? '>' EntityDef ::= Literal | ExternalDef; ... ExternalDef ::= NDataDecl? 'SYSTEM' Literal NDataDecl ::= 'NDATA' S Name S [ VC: Notation Declared ] These are badly broken. <!ENTITY foo"bar"> or <!ENTITY blortSYSTEM "farble"> is legal. In addition, the order of the external identifier and notation identifier is backwards from 8879 production [108]. Try these: EntityDecl ::= '<!ENTITY' S Name S EntityDef S? '>' ExternalDef ::= 'SYSTEM' S Literal (S NDataDecl)? NDataDecl ::= 'NDATA' S Name [ VC: Notation Declared ] Clause 4.2.3, production group 30 (W3C) or 29 (Textuality): Encoding ::= ... Are the members of the enumerated list the only encodings permitted in XML 1.0, at all? What about other IANA-registered encodings, if the system can handle them? What about X-prefixed ones that are spoken by the system? Clause 4.4, production group 30 (Textuality; 31? on W3C): NotationDecl ::= '<!NOTATION' S Name S Extid S? '>' The Extid non-terminal is not defined. It is also used in doctypedecl, but I didn't notice until now. Also, depending on what Extid ends up defined as, this can read as requiring system identifiers on notation declarations. In the worst case, MIME types may be required, a for-real filename must NEVER be required (and should be deprecated) in a notation declaration. Appendix A, feature list: 3. The "&" connector in content models Make that 'The "and" connector'. General gripe #1: Versioning XML desparately needs versioning information. Otherwise, XML 2.0 will struggle painfully with how to make itself not cripple XML 1.0 systems. At a minimum, XML 1.0 can have an absence of versioning information if it knows how to recognize a newer version. I would prefer that an explicit mechanism be provided now, but with a default of 1.0 in its absence. A required element has been rejected; perhaps a PI, then: <?XML Version 1.0?>. General gripe #2: Case folding I predict, based on the www-international list and private conversations, that case-folding is not going to work as spec'd now. But besides that: a non-folding document is valid in a folding system, but the reverse is not true. In other words, we can add case folding later, but taking it out is going to HURT! Keep XML 1.0 case sensitive and spend the next year trying to figure out how to do case- folding intelligently, if it can be done at all. I think the number of SGML documents (including HTML) that will be valid XML at this point with no processing whatsoever is vanishingly small, and allowing the individual data preparers to decide what to do with their case is far better than to impose weird rules from above. General gripe #3: HTML concessions The list of elements is silly. <img></img> works fine. (I tried the <br/> hack; <img/> gets through Lynx 2.6, but <br/> doesn't get recognized.) If an XML processor detects that the document is HTML, then for God's sake process it as HTML! I can think, off the top of my head, of 2 SGML systems that drop into a special mode when HTML is detected. (If it were earlier, I could probably come up with more.) The <e></e> acceptance in HTML means that documents *can* be XML and acceptable to HTML browsers, and *all* HTML (unless anyone really uses </p> and </li>) will require some processing. Case fold it and end-tag empties while you're at it. The entity list is also this way. I am very in favor of < and &, and mostly in favor of the HTML 2.0 list. A list of future entities has, I think, no place in the XML 1.0 spec. It's being announced in a little over a week; no one is used to those entities yet. If there were a DTD-less way to overrride them, as with PIs, I wouldn't mind so much. (And as for who will incorporate them into Lynx: probably me. Lynx 2.6 currently has the biggest list of supported entities of any Web client, and I'd be happy (if busy) to keep it on top of the stack.) Bleah. I'm going to bed now. Please excuse terminal-induced weirdness in this message. -Chris -- <!NOTATION SGML.Geek PUBLIC "-//GCA//NOTATION SGML Geek//EN"> <!ENTITY crism PUBLIC "-//EBT//NONSGML Christopher R. Maden//EN" SYSTEM "<URL>http://www.ebt.com <TEL>+1.401.421.9550 <FAX>+1.401.521.2030 <USMAIL>One Richmond Square, Providence, RI 02906 USA" NDATA SGML.Geek>
Received on Tuesday, 12 November 1996 01:03:35 UTC