- From: DuCharme, Robert <DuCharmR@moodys.com>
- Date: Thu, 13 May 1999 11:15:51 -0400
- To: "'www-xml-schema-comments@w3.org'" <www-xml-schema-comments@w3.org>
Comments on the 6-May-1999 "XML Schema Part 1" Working Draft ============================================================ I like a lot of it, but I've limited this message to comments on general concepts, practices, and choice of terminology that I feel need revision. In my marked-up hard copy, I have noted typos and suggested revisions for clarity that would fall into the category of "copyediting" and "basic Strunk and White stuff" (turning passive sentences into active, etc.) if the WG is interested in seeing them at this stage. As an example: the single sentence--and what a sentence--from section 3.5, "No element type is referenced by more than one of the explicit and acquired content models (unless two or more acquired models share modelElts acquired from a common ancestor, in which case such modelElts shall be ignored in all but the first for the purpose of constructing the effective model), in which case if the non-vacuous explicit and acquired models are all eltOnly the effective model is a sequence of all the non-vacuous acquired models, in the order in which they are specified in the refinements list, followed by the explicit model (if it is non-vacuous), or else if the non-vacuous explicit and acquired models are all mixed, the effective model is a mixed whose elementTypeRefs and elementTypeDecls are the union of the elementTypeRefs and elementTypeDecls of all the non-vacuous explicit and acquired models." could be revised into the following four sentences over two paragraphs: "No element type is referenced by more than one of the explicit and acquired content models (unless two or more acquired models share modelElts acquired from a common ancestor, in which case the processor shall ignore all but the first modelElt when constructing the effective model). There are two ways this can happen: either the non-vacuous explicit and acquired models are all eltOnly or they are mixed. If they are all eltOnly, the effective model is a sequence of all the non-vacuous acquired models, in the order in which the refinements list specifies them, followed by the explicit model if it is non-vacuous. If the non-vacuous explicit and acquired models are all mixed, the effective model is a mixed content model whose elementTypeRefs and elementTypeDecls are the union of the elementTypeRefs and elementTypeDecls of all the non-vacuous explicit and acquired models." My revision may betray a misunderstanding of the original's meaning, but the general idea of breaking down overlong sentences (134 words!) into multiple sentences or even bulleted lists would make the Structures document much easier to understand. I've broken down the rest of this into three sections: General Issues, Terminology, and Specifics by Section. *General Issues Examples: some, like those throughout section 3.5 "Archetype Refinement," are excellent, with good explanations using complete sentences and examples that use real-world names to make the purpose of the demonstrated constructions clearer. However, many if not most examples in the Structures document merely demonstrate the syntax of a construction without giving any clues as to how and why it is used. Demonstrating the declaration of a foo with the example "<foo name="myFoo"> tells readers nothing about the purpose of a foo that they couldn't find out from Appendixes A and B (ironically, the brief comments in Appendix B's DTD sometimes explain the purpose of certain constructs better than any part of the Structures document itself). Values of "name1" or "name2" for the name attribute are no better. HTML rendering: references to sections within the Structures document look the same as links to definitions, making a sentence like "See <ul>Element Type Declaration</ul> for discussion and examples of the appearance of <ul>elementTypeDecl</ul> above" (3.4.6) difficult to read. I suggest either italicizing section titles in these references or adding the phrase "the section" in front of them--for example, "See the section <ul>Element Type Declaration</ul> for discussion..." *Terminology Many important new terms are used repeatedly before they are defined. For example, the revised paragraph above uses the term "vacuous," which hasn't been defined yet, five times; "archetype" and "NCName" are also used repeatedly before any clues about their meanings are given. A complete definition at first use of each new term may cause structural problems, but an abbreviated, parenthesized definition at first use (section 3.4's definition of SC is a good model), with a pointer to the full definition would make the document much easier to understand for readers whose first introduction to the proposal is a cover-to-cover reading of this document. Perhaps an introductory overview like the SOX Note's "Structure of a SOX Document" would be a good place to first bring up these concepts and terms. It would make the remainder of the spec much easier to read. If a new section isn't added, at least more entries could be added to section 2.4. The Structure document's frequent misuse of parts of speech (for example, using verbs like "include," "specialize," and "import" as both nouns and adjectives, "specialize" as a noun, and adjective like "fixed" as a noun) make it very difficult to read. I can only imagine what it's like for someone not speaking English as a first language. To say "this is a technical usage" is no excuse unless there is a good precedent (Knuth, dragon book, etc.) for a given term. Otherwise, that's like saying "we're computer people, it's OK for us, deal with it." See more about this on "include" below. When a non-noun (for example, "specialize") is used as a noun because it's a token (that is, the lhs of some production in the document), references to it would be easier to read if described as "a specialize token" (or constraint, or whatever). This is done nicely in the comment before Appendix A's element type declaration for archetype: "It may include a refines element that specifies..." Other places in the Structures document would have put this "It may include a refines that specifies..." Obviously the former is clearer. Vacuous: this is a pejorative term, and therefore more colorful than any alternatives that I'm sure were considered, but do you need this much color? "Vacant" would be more appropriate. Refine: The standard English use of the term gets twisted too far. To "refine" something is to change it, not to created a changed copy. I assume that "inherit" was considered and rejected, although I don't understand why, especially considering the associated vocabulary brought along with it, like "ancestor" and "daughter." Daughter: I assume that this is used instead of "children" because of the latter's use in referring to contained elements. "Son" would be considered sexist, but so is "daughter." To me, "daughter" implies that there is a binary distinction between two types of descendants. (What if red-black trees had been called "son-daughter" trees?) Why not just call these "descendants"? Export: as with "refine," the use of the term has something in common with the standard English usage but also something significantly different from it, which will confuse people. To export something is to actively send it somewhere, whether you're sending bourbon from Kentucky to Japan or a comma-delimited file from Excel to a named directory. To merely make something available for import does not export it. (On the other hand, "import" as used in the schema spec does make sense.) Nearly well-formed: the term "nearly" adds vagueness that doesn't help any specification. "Nearly well-formed" says that a document falls short of complete well-formedness and that we're not sure where it falls short. For a document whose incompleteness in meeting a certain ideal can be specifically identified (as "nearly well-formed" is used in the document) a term like "adequately well-formed" would be more appropriate. include (as a noun): This is well-understood by programmers, but I don't consider it a technical term. Like the term "dialog" to refer to a dialog box, it's programmer slang. The Merriam-Webster dictionary has no listing for "include" as a noun, but it does define "inclusion" as "something that is included." For a more computer science way to say it, "included external resource" would also make sense. The last paragraph of 4.7, in addition to using "include" as a noun, also uses "included schema," which is much better. Plural of "schema": the document uses the term "schemata" several times and "schemas" many more times. Either it should spell out a specific reason for using one over the other in certain contexts or it should pick one, identify it in the glossary definition of "scheme" (just as a dictionary names a plural in a definition) and use it consistently. (My vote: "Schemas." As Orwell put it, "Bad writers, and especially scientific, political and sociological writers, are nearly always haunted by the notion that Latin or Greek words are grander than Saxon ones." http://www.bnl.com/shorts/stories/patel.html) global and top-level: both are used several times in the document, but I couldn't find a definition of either in the document. I'm guessing that "top-level" means a non-nested elementTypeDecl. Whether I'm right or wrong, it's meaning should be made more explicit. *Specifics by Section 1) 2.1. definition of "Schema" "...the information set of XML documents" is pretty broad; doesn't it mean "the information set of a particular class/collection/set/type of documents? The Structures document never mentions the concept of a "document class" or "document type." Does it ever describe a way to refer to a collection of documents conforming to a particular schema? Or do we just assume the use of the XML term document type? 2) 2.4 Purpose of "Archetype Definition," "Content Type," and "Element Content Model" "Elements" in each of these is vague much like "documents" is in 2.1 as described above. Each use of the term looks like it refers to *all* the elements in a document instance; don't they mean "a specified class/set/type of elements," especially considering that each defined term is given in the singular? 3) 3.1 caption under second example Does "new component" refer to a new component of a schema? A new class of components for a document? Who is the "we" doing the declaring? Isn't the schema doing the declaring? The distinction between creating, declaring, and specifying ("the specification for that component") in this sentence is confusing. Does the sentence mean "By declaring a new component, a schema associates that component's name with the specification for that component"? 4) 3.3, "Constraint on Schemas: One Reference Only" "It is an error for both these attributes to appear on the same element in a schema." Then perhaps they shouldn't be attributes. If they were child elements of the import element type, a (schemaAbbrev|schemaName) equivalent in the content model would put this constraint in the schema language's concrete syntax, where its enforcement is more easily automated than that of a constraint that is only described in prose documentation. 5) 3.3, last paragraph The use of the term "appropriate" (three times) is confusing. 6) 3.3, last paragraph "...may also obtain." May also what? 7) 3.4.2 first paragraph "...pertinent to elements in instance documents." See 2) above. 8) 3.4.4 Attribute Group Definitions If I understand archetypes correctly, they can (among other things) group a collection of attribute definitions into a named, reusable unit, so I don't see what named attribute groups add to the schema language. What am I missing? 9) 3.4.9 first sentence "An element type declares the..." should read "An element type declaration declares the..." An element type doesn't declare anything; it gets declared. 10) 3.5 "substitutability" definition "One archetype is substitutable for another if any schema-valid instance of the former is necessarily..." The term "document instance" throughout the Structures document makes sense, as does the concept of an element instance. This line seems to be referring to an archetype instance, which I don't understand. Or does it mean "schema-valid element instance conforming to the former is necessary..."? 11) 3.5 "NOTE" describing regularPolygon example So the example's regularPolygon element is valid with respect to the polygon archetype, even though it has a "side" child element not mentioned by the polygon archetype declaration, because polygon has a "model" value of "refinable," right? 12) 3.6.1 "flavor can now be used in an entity reference in instances of the containing schema" as well as in document instances that conform to the containing schema, right? 13) 4.1 title If "Instance Document Constructs" are different from "Instance Documents" then they should be defined. If not, the title should just say "Instance Documents." 14) 4.2 second example The empty "export" element has an improperly closed XML comment. 15) 4.3 NOTE "Head" is never defined. Does this mean right after the <schema> start-tag? Does it mean the very beginning of the document, or right after the XML declaration if there is one? It needs to be clarified. 16) 4.5 first paragraph "Composed" is emphasized, but never defined. I assume it has no connection to compositor (production [36]). 17) 4.6 second example I believe that second <import start-tag should be an end-tag. 18) 6.1 paragraph beginning "The provision within..." "The effective element item of an element item (call this OEI)..." Why? What does the "O" stand for? Overall, there's a lot of great stuff in the draft. I look forward to the software that can work with these schema; kudos to Rick Jelliffe for jumping right in there! Bob DuCharme www.snee.com/bob <bob@ snee.com> "The elements be kind to thee, and make thy spirits all of comfort!" Anthony and Cleopatra, III ii
Received on Thursday, 13 May 1999 11:08:25 UTC