- From: Len Bullard <cbullard@HiWAAY.net>
- Date: Mon, 18 Nov 1996 13:45:08 -0600
- To: w3c-sgml-wg@w3.org
These are my comments on the Nov 14 draft. >Extensible Markup Language (XML) is an extremely simple dialect of SGML which is completely >described in this document. Remove "extremely simple". Not able to technically evaluate that without some criteria. >The goal is to enable generic SGML to be served, received, and >processed on the Web in the way that is now possible with HTML. Replace "generic SGML" with "this dialect of SGML" to clearly point out the difference. The term "generic SGML" is redundant in one sense, and not sensible in that any SGML is an application, not SGML. The use of the term, application profile, later is better in that it will be better understood in the standards community except for the confusion of SGML's common practice of using the terms DTD and SGML application interchangeably. Like three letter acronyms, all words are reserved by somebody. ;-) Unless a clarifying glossary is pointed to, you might want to include one in an appendix somewhere. >For this reason, XML has been >designed for ease of implementation, and for interoperability with both SGML and HTML. As HTML is an application of SGML, interoperability should refer to the systems that handle applications of SGML and specifically in the sentence cited, the systems that handle HTML. This is grandfathering and an imprecise equivalence. Change this to interoperability with systems. It is better to use portability where you intend, "I send you this markup and you do what you do with it" and use interoperability to mean "I send you this command and you do what you must". If you send an object that has both, you mean both. >Extensible Markup Language, abbreviated XML, describes a class of data objects called XML >documents which are stored on computers, and partially describes the behavior of programs which >process these objects. Does the language specify a behavior or does the spec prescribe a behavior based on the type of data object? It's quibbling, but unless there is a script in the XML, I expect it to only send data. >XML documents are made up of storage units called entities, which contain either text or binary >data. Text is immediately defined. Binary is defined later as notation. Would it be prudent to note this here in some short form or changed this to "text and non-text types defined by formal notation citation". Ick.... Binary seems restrictive in that SGML, VRML, CGM, etc are all notations and are not of necessity, binary. The later explanation >So-called binary data may in fact be textual, perhaps even >well-formed XML text; its identification as binary means that an XML processor need not >parse it in the fashion described by the specification. doesn't justify the use of the term, "binary". Why is this term used when all it appear to mean is "not constrained by XML specification". >A software module called an XML processor is used to read XML documents and provide >access to their content and structure. It is assumed that an XML processor is doing its work on >behalf of another module, referred to as the application. Processor is fine. I believe HyTime uses the term "engine". Are these equivalent? The term "application" is loose and may confuse the SGML user. Again, there are no unreserved terms left. Would it better to use the terms server and client? 1.5 Syntactic Constraints The tables should be in an appendix unless that makes them non-normative. Examples would suffice. >Entities must each contain an integral number of elements, comments, processing instructions, >and references, possibly together with character data not contained within any element in the >entity, or else they must contain non-textual data, which by definition contains no elements. Just to be sure, does this preclude the use of XML to reference an entity that contains other SGML/non-XML application data? >Users may extend the ISO 10646 character repertoire, in the rare cases where this is necessary, by >exploiting the private use areas. An editor design note to explain the ramifications of this might be prudent. Another tact is to add such to a follow-on "The Annotated XML Specification" which could be privately written and published. This is being done with VRML to help implementors with areas of the spec that need explanation, not clarification and to document design decisions. Clear transcripts of all design discussions and decisions are useful for this task. > Comments may appear anywhere except in a CDATA section, i.e. within element content, in > mixed content, or in a DTD subset. They may not occur within declarations or tags. They are > not part of the document's character data; an XML processor may, but need not, make it > possible for an application to retrieve the text of comments. For compatibility, the string -- > (double-hyphen) may not occur within comments. Is the use of "may" correct in this section as defined earlier? >Processing instructions (PIs) allow the XML processor to pass instructions directly to >selected applications. Since PIs have a deprecated heritage and undeserved bad reputation, should some explanation of the intent for its use be added in a non-normative appendix? This will come up again in later discussions of linking and external interface design to XML processors. > CDATA sections begin with the string <![CDATA[ and end with the string ]]> Bugugly but ok. We are used to it and an editor can hide the acne scars. >In element content, all white space (S) is ignored; validating XML processors must not pass it >to the application. Non-validating processors which do not read the DTD must treat all >elements as if they were declared with mixed content; this will in some cases result in a different >parse tree from that produced by processors which do read the DTD. Another candidate for design notes. The impact of "a different parse tree" should be noted. >The white space handling mode is signaled through the use of a reserved attribute; XML >processors must behave as though every element encountered in the document had an attribute >declared thus: TripleBugUgly. In effect, to get around having a DTD, the DTD information moves into the instance. I don't know any other way though unless XML application communities are given a way to meta-declare this information or put it in the stylesheet. BTW, which is the default behavior if not specified in the instance? Is it undefined or left to the implementor? An argument to support this approach is that any editor worth having will automatically insert this attribute value for elements which clearly require it. >A document author can communicate whether or not >DTD processing is necessary using a required markup declaration (abbreviated RMD) >processing instruction, which appears as a pseudo-attribute on the XML declaration: As ugly as this first appears, it seems sound to leave the decision to the policies/processes of the document originator, be it author or organization. It will be interesting to see what develops as common practice. This looks like another candidate for the editor to enforce as a convenience to authors who would probably exercise the conservative options if they are aware of them. >For interoperation with existing Web software, users of XML may desire to create documents >which are simultaneously valid XML and processable by existing HTML browsers. The >difference in the form of empty elements may be accommodated by using an XML-compatible >version of the HTML DTD ... Sanity check for me. This says in effect, you can <e/> or <e></e> but never <e>. Is that right? In that case, the text as presented applies to any existing SGML application that currently uses <e>, right? So, this explanation should be written to the general case of all SGML applications which use <e>, with HTML cited as a well-known example. >The grammar is built on content particles (CPs), which consist of names, choice lists of content particles, >or sequence lists of content particles: What is the origin of the term, content particles? Are we invoking dead poets again or avoiding subatomic theorists? ;-) BTW, should there be a note that tells an SGML hacker that the minimization flags are gone, or do you wish to explain this 10,000 times when the error reports come in? > For compatibility reasons, the same Nmtoken may not occur more than once in the enumerated attribute > types of a single attribute-list declaration. Strong hint to WG8: this should go away quickly before the XML implementors get very far. >Notation declarations provide a name for the notation, for use in entity and attribute-list >declarations and in attribute-value specifications, and an external identifier for the notation which >may allow an XML processor or its client application to locate a helper application capable of >processing data in the given notation. The term "client application" is introduced. Can it be used in the abstract as well? The term "helper application" is introduced. It is informal though understood. Is there a better term to use here or is it necessary to introduce the concept informally? I can't think of a better term, but helper seems imprecise as all it says is "give this to someone else" and that could be a "plugin". >Despite this, there are a small number of cases where XML fails to be a pure subset of SGML, >including: Prefer the word "strict". This is not a moral or chemical issue but it is a legal one. Some explanation of what it means to be an "application profile" vs a "strict subset" is in order. Otherwise, the rationale that follows equates to "we thought it was ugly so we threw it out". Many of the decisions which violate SGML strictures are a result of the design decision to enable DTD-less processing. This should be so stated and the rationale for it. Otherwise, since any processing organization can agree in advance to have identical DTDs, they can always live without that design decision and get the same effect. So, tie this to the design rationale and save yourselves some headaches later. >The following list describes features which are available in SGML but not in XML. It may not >be complete. Complete it. ***************************************************************** Ok. Good job on the writing and substance. I understand the requirement on the 20 page limit. I do think clarity or completeness should not take a backseat to brevity. len bullard lockheed-martin
Received on Monday, 18 November 1996 14:44:45 UTC