1 Nov. draft comments from Christopher R. Maden on 1996-11-12 (w3c-sgml-wg@w3.org from November 1996)

From: Christopher R. Maden <crm@ebt.com>
Date: Tue, 12 Nov 1996 06:02:32 GMT
To: w3c-sgml-wg@w3.org
Message-Id: <199611120602.GAA03609@dot.EBT.COM>
All comments are against the 1 November draft.  However, the draft on
the W3C Web server and the one on Textuality are both identified as
version 0.01, 1 November, yet are different.  When not identified,
comments are believed applicable to both, but have only been checked
against (and cut&pasted from) the Textuality version.

Header:
   Pointer to "this version" is invalid.

Clause 1.1, W3C version only:
   Says that this is only for the SGML ERB, but is publicly available.
   It looks like a "leak".

Clause 1.5, production group 4:
   QuotedNames ::= '"' Names '"' | "'" Names "'"

The non-terminal Names is never defined.

Clause 2.2, first paragraph:
   A character is A character is an atomic...

Typo.

Clause 2.3, production group 6:
   | '<!DOCTYPE' (Name | S)+ ('[' [^]]* ']')? '>' /* doc type
     declaration */

The beginning of the production component allows a jumble, and the end
does not allow space between DSC and MDC.  If the purpose is to simply
allow recognition and skipping of the doctype declaration, then

'<!DOCTYPE' [^>]* ('[' [^]]* ']' S?)? '>'

should suffice; if more restrictive syntax is warranted, then
something like the doctypedecl production (in 2.8) should be invoked.

Clause 2.3, last paragraph, W3C:
   the single-quote character (') may be represented as "&sq;", and
   the double-quote character (") as "&dq;".
Clause 2.3, last paragraph, Textuality:
   the single-quote character (') may be represented as "&sqot;", and
   the double-quote character (") as "&quot;".

Please synchronize the W3C version and Textuality's, and PLEASE change
version numbers when something as substantial as this changes.  Both
versions are identified as 0.01.  I far prefer the Textuality version
here.

Clause 2.4, first paragraph:
   Comments may appear anywhere that character data may, except in a
   marked section (more properly, comments appearing in a marked
   section will not be recognized as such).

Comments may appear in element content and in the prolog, as well, no?
In other words, "Comments may appear anywhere, except in a marked
section; i.e., within element content, in mixed content, or in a
document type declaration subset (see doctypedecl)."

Clause 2.6, and in general:

Wherever using a term important to ISO 8879 in a different manner from
8879, the term 8879 uses for the concept should be given for
reference.  In this case, the term "marked section" in XML refers to
what 8879 calls "CDATA marked section".  This should be made clear in
a note; as 8879 is referenced, some non-trivial portion of
implementers will make reference to it, and different terminology may
confuse them.

Clause 2.7:

I still think that making whitespace handling vary by an attribute
value is a big mistake.  This is something that should be handled by a
stylesheet.  If all whitespace, everywhere, is preserved by the XML
processor, the application is guaranteed not to lose any it wants, and
can toss anything it doesn't.  For most formatting, some whitespace
will need to be collapsed and others kept, so a renderer would need
this ability anyway.  Adding it to the processor requirement seems
silly.

Clause 2.7, example:
   <!ATTLIST * -XML-SPACE (PRESERVE|COLLAPSE) #IMPLIED>

Does this mean '*' is allowable at that point in an ATTLIST
declaration?  It should be made clear in the surrounding text.  (Given
that attlist declarations are not cumulative, this attribute will need
to be definied for every element on which it will be used, even if
just to default it.  Seems a bit silly.)

Clause 2.8, production group 12:

The placement of the production group breaks up the flow of text; the
paragraph after refers to "these two subsets", and I was very confused
as to *which* two until I realized that they had been referenced in
the paragraph prior to the production group.  Move the group down a
paragraph, just before the example, maybe.

Clause 2.9, third paragraph:
    1. attributes with default values, and elements to which these
       attributes apply appear in the document,..

I think a more applicable phrasing is, "attributes with default
values, and elements to which these attributes apply *and are not
explicitly set* appear in the document..." though this may be too
complex to easily check.

Clause 2.9, last paragraph:
   If no RMD is provided, the effect is identical to an RMD with the
   value ALL.

I feel that NONE should be the default.  The simplest XML document
should not require the RMD at all.

Clause 3.1, second text paragraph:
   The Name in the start- and end-tag rules gives the element's type.

Strike "rules", or reword this.  "The Name referred to in the ..." or
"The Name in the ... -tags gives...".

Ibid:
   The Name-QuotedCData pairs are referred to as the attributes of the
   element,...

Everyone here is aware that this is the attribute value specification,
but we use the terms interchangeably.  We must NOT do this in the XML
spec; it caused endless headaches when Netscape started to handle
entity refs in attribute value *specifications*.  The discussions
about when to use &amp; and when to use %24 in <a href="..."> went for
far too long on www-html, html-wg, and lynx-dev.

Care must be taken in XML to use the correct terms "attribute value
specification" and "attribute value" as appropriate.  Even though
entity references are not allowed in AVSs in XML 1.0, lack of
confusion now will make going forward easier.

Clause 3.1, production group 17:
   content ::= (element | PCDATA | MS | PI | Comment)*

There should be [ VC: Content model ] after that; i.e., the content of
an element will match the content model in the DTD if the document is
valid.

Clause 3.2, second list:
   More simply stated, the elements, delineated by start- and
   end-tags, nest within each other properly.

Either strike "properly" or define it.  Nesting makes sense, I think,
to the target non-SGML-aware audience; adding "properly" implies that
there's something special that's not being said.

Clause 3.5:

Where did it go?  It's in the W3C version, but not on Textuality.  Has
the DTD summary been done away with?  It's no longer needed for empty
elements, and is moot for mixed vs. element content distinction, but
would be a VERY useful way to override the defaulted entities without
requiring DTD parsing.  The receiving non-DTD-speaking application
could say, "This &prod; here does something other than a big ol' pi,
but I don't know what...".

Clause 4.2, productions:
   EntityDecl ::= '<!ENTITY' S Name EntityDef S? '>'
   EntityDef ::= Literal | ExternalDef;
...
   ExternalDef ::= NDataDecl? 'SYSTEM' Literal
   NDataDecl ::= 'NDATA' S Name S [ VC: Notation Declared ]

These are badly broken.

<!ENTITY foo"bar"> or <!ENTITY blortSYSTEM "farble"> is legal.  In
addition, the order of the external identifier and notation identifier
is backwards from 8879 production [108].  Try these:

EntityDecl ::= '<!ENTITY' S Name S EntityDef S? '>'
ExternalDef ::= 'SYSTEM' S Literal (S NDataDecl)?
NDataDecl ::= 'NDATA' S Name [ VC: Notation Declared ]

Clause 4.2.3, production group 30 (W3C) or 29 (Textuality):
   Encoding ::= ...

Are the members of the enumerated list the only encodings permitted in
XML 1.0, at all?  What about other IANA-registered encodings, if the
system can handle them?  What about X-prefixed ones that are spoken by
the system?

Clause 4.4, production group 30 (Textuality; 31? on W3C):
   NotationDecl ::= '<!NOTATION' S Name S Extid S? '>'

The Extid non-terminal is not defined.  It is also used in
doctypedecl, but I didn't notice until now.

Also, depending on what Extid ends up defined as, this can read as
requiring system identifiers on notation declarations.  In the worst
case, MIME types may be required, a for-real filename must NEVER be
required (and should be deprecated) in a notation declaration.

Appendix A, feature list:
    3. The "&" connector in content models

Make that 'The "and" connector'.

General gripe #1: Versioning

XML desparately needs versioning information.  Otherwise, XML 2.0 will
struggle painfully with how to make itself not cripple XML 1.0
systems.  At a minimum, XML 1.0 can have an absence of versioning
information if it knows how to recognize a newer version.  I would
prefer that an explicit mechanism be provided now, but with a default
of 1.0 in its absence.  A required element has been rejected; perhaps
a PI, then: <?XML Version 1.0?>.

General gripe #2: Case folding

I predict, based on the www-international list and private
conversations, that case-folding is not going to work as spec'd now.
But besides that: a non-folding document is valid in a folding system,
but the reverse is not true.  In other words, we can add case folding
later, but taking it out is going to HURT!  Keep XML 1.0 case
sensitive and spend the next year trying to figure out how to do case-
folding intelligently, if it can be done at all.  I think the number
of SGML documents (including HTML) that will be valid XML at this
point with no processing whatsoever is vanishingly small, and allowing
the individual data preparers to decide what to do with their case is
far better than to impose weird rules from above.

General gripe #3: HTML concessions

The list of elements is silly.  <img></img> works fine.  (I tried the
<br/> hack; <img/> gets through Lynx 2.6, but <br/> doesn't get
recognized.)  If an XML processor detects that the document is HTML,
then for God's sake process it as HTML!  I can think, off the top of
my head, of 2 SGML systems that drop into a special mode when HTML is
detected.  (If it were earlier, I could probably come up with more.)
The <e></e> acceptance in HTML means that documents *can* be XML and
acceptable to HTML browsers, and *all* HTML (unless anyone really uses
</p> and </li>) will require some processing.  Case fold it and
end-tag empties while you're at it.

The entity list is also this way.  I am very in favor of &lt; and
&amp;, and mostly in favor of the HTML 2.0 list.  A list of future
entities has, I think, no place in the XML 1.0 spec.  It's being
announced in a little over a week; no one is used to those entities
yet.  If there were a DTD-less way to overrride them, as with PIs, I
wouldn't mind so much.

(And as for who will incorporate them into Lynx: probably me.  Lynx
2.6 currently has the biggest list of supported entities of any Web
client, and I'd be happy (if busy) to keep it on top of the stack.)

Bleah.  I'm going to bed now.  Please excuse terminal-induced
weirdness in this message.

-Chris
-- 
<!NOTATION SGML.Geek PUBLIC "-//GCA//NOTATION SGML Geek//EN">
<!ENTITY crism PUBLIC "-//EBT//NONSGML Christopher R. Maden//EN" SYSTEM
"<URL>http://www.ebt.com <TEL>+1.401.421.9550 <FAX>+1.401.521.2030
<USMAIL>One Richmond Square, Providence, RI 02906 USA" NDATA SGML.Geek>
Received on Tuesday, 12 November 1996 01:03:35 UTC