XML Draft Comments from Len Bullard on 1996-11-18 (w3c-sgml-wg@w3.org from November 1996)

From: Len Bullard <cbullard@HiWAAY.net>
Date: Mon, 18 Nov 1996 13:45:08 -0600
To: w3c-sgml-wg@w3.org
Message-ID: <3290BCC4.3F44@HiWAAY.net>
These are my comments on the Nov 14 draft.

>Extensible Markup Language (XML) is an extremely simple dialect of SGML which is completely
>described in this document. 

Remove "extremely simple".   Not able to technically evaluate that
without some criteria.

>The goal is to enable generic SGML to be served, received, and
>processed on the Web in the way that is now possible with HTML. 

Replace "generic SGML" with "this dialect of SGML" to clearly point out
the 
difference.  The term "generic SGML" is redundant in one sense, and not 
sensible in that any SGML is an application, not SGML.  The use of the 
term, application profile, later is better in that it will be better
understood 
in the standards community except for the confusion of SGML's common 
practice of using the terms DTD and SGML application interchangeably.

Like three letter acronyms, all words are reserved by somebody.  ;-)
Unless a clarifying glossary is pointed to, you might want to include 
one in an appendix somewhere.

>For this reason, XML has been
>designed for ease of implementation, and for interoperability with both SGML and HTML.

As HTML is an application of SGML, interoperability should refer to the 
systems that handle applications of SGML and specifically in the
sentence cited, 
the systems that handle HTML.  This is grandfathering and an imprecise
equivalence.  Change this to interoperability with systems.

It is better to use portability where you intend, "I send you this
markup 
and you do what you do with it" and use interoperability to mean "I send
you this 
command and you do what you must".  If you send an object that has both,
you 
mean both.

>Extensible Markup Language, abbreviated XML, describes a class of data objects called XML
>documents which are stored on computers, and partially describes the behavior of programs which
>process these objects.

Does the language specify a behavior or does the spec prescribe a
behavior based on the 
type of data object?  It's quibbling, but unless there is a script in
the XML, I expect it to 
only send data.

>XML documents are made up of storage units called entities, which contain either text or binary
>data.

Text is immediately defined.  Binary is defined later as notation. 
Would it be prudent to note 
this here in some short form or changed this to "text and non-text types
defined by formal 
notation citation".  Ick....  Binary seems restrictive in that SGML,
VRML, CGM, etc are all 
notations and are not of necessity, binary.  The later explanation 

	>So-called binary data may in fact be textual, perhaps even
	>well-formed XML text; its identification as binary means that an XML
processor need not
	>parse it in the fashion described by the specification.

doesn't justify the use of the term, "binary".  Why is this term used
when all it appear to mean is 
"not constrained by XML specification".

>A software module called an XML processor is used to read XML documents and provide
>access to their content and structure. It is assumed that an XML processor is doing its work on
>behalf of another module, referred to as the application.

Processor is fine.  I believe HyTime uses the term "engine".  Are these
equivalent?
The term "application" is loose and may confuse the SGML user.  Again,
there 
are no unreserved terms left.  Would it better to use the terms server
and client?

1.5 Syntactic Constraints 

The tables should be in an appendix unless that makes them
non-normative.  Examples 
would suffice.

>Entities must each contain an integral number of elements, comments, processing instructions,
>and references, possibly together with character data not contained within any element in the
>entity, or else they must contain non-textual data, which by definition contains no elements.

Just to be sure, does this preclude the use of XML to reference an
entity that contains 
other SGML/non-XML application data?

>Users may extend the ISO 10646 character repertoire, in the rare cases where this is necessary, by >exploiting the private use areas.

An editor design note to explain the ramifications of this might be
prudent.   Another tact is 
to add such to a follow-on  "The Annotated XML Specification" which
could be privately 
written and published.  This is being done with VRML to help
implementors with areas 
of the spec that need explanation, not clarification and to document
design decisions.
Clear transcripts of all design discussions and decisions are useful for
this task.

> Comments may appear anywhere except in a CDATA section, i.e. within element content, in
> mixed content, or in a DTD subset. They may not occur within declarations or tags. They are
> not part of the document's character data; an XML processor may, but need not, make it
> possible for an application to retrieve the text of comments. For compatibility, the string --
> (double-hyphen) may not occur within comments. 

Is the use of "may" correct in this section as defined earlier?

>Processing instructions (PIs) allow the XML processor to pass instructions directly to
>selected applications.

Since PIs have a deprecated heritage and undeserved bad reputation,
should some 
explanation of the intent for its use be added in a non-normative
appendix?   This 
will come up again in later discussions of linking and external
interface design to 
XML processors. 

> CDATA sections begin with the string <![CDATA[ and end with the string ]]>

Bugugly but ok.  We are used to it and an editor can hide the acne
scars.

>In element content, all white space (S) is ignored; validating XML processors must not pass it
>to the application. Non-validating processors which do not read the DTD must treat all
>elements as if they were declared with mixed content; this will in some cases result in a different
>parse tree from that produced by processors which do read the DTD.

Another candidate for design notes.  The impact of "a different parse
tree" should be 
noted.

>The white space handling mode is signaled through the use of a reserved attribute; XML
>processors must behave as though every element encountered in the document had an attribute
>declared thus:

TripleBugUgly.  In effect, to get around having a DTD, the DTD
information moves into 
the instance.   I don't know any other way though unless XML application
communities 
are given a way to meta-declare this information or put it in the
stylesheet.  BTW, which 
is the default behavior if not specified in the instance?  Is it
undefined or left to the 
implementor?   An argument to support this approach is that any editor
worth having will 
automatically insert this attribute value for elements which clearly
require it.

>A document author can communicate whether or not
>DTD processing is necessary using a required markup declaration (abbreviated RMD)
>processing instruction, which appears as a pseudo-attribute on the XML declaration:

As ugly as this first appears, it seems sound to leave the decision to
the policies/processes 
of the document originator, be it author or organization.   It will be
interesting to 
see what develops as common practice.  This looks like another candidate
for 
the editor to enforce as a convenience to authors who would probably
exercise  
the conservative options if they are aware of them.

>For interoperation with existing Web software, users of XML may desire to create documents
>which are simultaneously valid XML and processable by existing HTML browsers. The
>difference in the form of empty elements may be accommodated by using an XML-compatible
>version of the HTML DTD ...

Sanity check for me.  This says in effect, you can <e/> or <e></e> but
never <e>.   Is that 
right?  In that case, the text as presented applies to any existing SGML
application that 
currently uses <e>, right?  So, this explanation should be written to
the general case 
of all SGML applications which use <e>, with HTML cited as a well-known
example.

>The grammar is built on content particles (CPs), which consist of names, choice lists of content particles, >or sequence lists of content particles: 

What is the origin of the term, content particles?  Are we invoking dead
poets again or 
avoiding subatomic theorists? ;-)

BTW, should there be a note that tells an SGML hacker that the
minimization flags are gone, 
or do you wish to explain this 10,000 times when the error reports come
in?

> For compatibility reasons, the same Nmtoken may not occur more than once in the enumerated attribute > types of a single attribute-list declaration.

Strong hint to WG8:  this should go away quickly before the XML
implementors get very far.

>Notation declarations provide a name for the notation, for use in entity and attribute-list
>declarations and in attribute-value specifications, and an external identifier for the notation which
>may allow an XML processor or its client application to locate a helper application capable of
>processing data in the given notation. 

The term "client application" is introduced.  Can it be used in the
abstract as well?  The 
term "helper application" is introduced.  It is informal though
understood.   Is there 
a better term to use here or is it necessary to introduce the concept
informally?  I can't
think of a better term, but helper seems imprecise as all it says is
"give this to 
someone else" and that could be a "plugin".

>Despite this, there are a small number of cases where XML fails to be a pure subset of SGML,
>including: 

Prefer the word "strict".  This is not a moral or chemical issue but it 
is a legal one.  Some explanation 
of what it means to be an "application profile" vs a "strict subset" is
in order.  Otherwise, 
the rationale that follows equates to "we thought it was ugly so we
threw it out".  Many of the 
decisions which violate SGML strictures are a result of the design
decision to enable 
DTD-less processing.  This should be so stated and the rationale for
it.  Otherwise, since 
any processing organization can agree in advance to have identical DTDs,
they can 
always live without that design decision and get the same effect.  So,
tie this to the 
design rationale and save yourselves some headaches later.

>The following list describes features which are available in SGML but not in XML. It may not
>be complete.

Complete it.

*****************************************************************

Ok.  Good job on the writing and substance.  I understand the 
requirement on the 20 page limit.  I do think clarity or completeness 
should not take a backseat to brevity.

len bullard
lockheed-martin
Received on Monday, 18 November 1996 14:44:45 UTC