- From: Ronald Bourret <rbourret@ito.tu-darmstadt.de>
- Date: Tue, 8 Jun 1999 12:25:45 +0200
- To: "'www-xml-schema-comments@w3.org'" <www-xml-schema-comments@w3.org>
OVERALL COMMENTS: ================ This is a very nice first draft. It is well-organized, mostly readable (a bit thick in places), and fairly thorough. I'm really happy to see that a lot of work went into usability features such as archetypes, named content models, attribute groups, etc. My apologies that these comments are so late. MAJOR COMMENTS: ================ Section 3.5 -- Archetype Refinement Archetype refinement is very scary. I have no real technical grounds for complaint -- it just feels overly complex and of limited use with respect to most other features. While I understand the motivation, I would rather see it postponed to a later version. Section 3.6 -- Entities and Notations I strongly suggest that you split entity declarations into a separate language, just as data types are a separate language. While I concede the need to declare entities in instance syntax (for example, in the Fragments spec), they shouldn't be mixed together with the logical declarations. The primary reason for this is that entities can be defined on a per-document basis, while the logical declarations are defined on a per-document-class basis. Mixing the two makes it difficult/impossible to define schemas that are useful for an entire class of documents. MINOR COMMENTS: ============== Section 2.3 -- On 'types' What is gained by making the distinction between definitions and declarations? In particular, this is reflected in the element tag names and is unlikely to be understood by the unwashed masses (like me) who are writing schemas. Section 3.1 -- The Schema a) What is the relationship between schemaIdentity and schemaName and why are they separate? My first guess was that schemaName gave the location of a schema document (for example, for use in an import statement) and that schemaIdentity gave the (single, unique, unchanging) ID of the schema. However, this theory doesn't seem to be true, as schemaRef refers to schemaName, not schemaIdentity. I suspect that schemaIdentity should be replaced by schemaName (or vice versa). b) Why are schemaIdentity and version separate? In my mind, a different version of a schema should have a different identity -- I certainly don't want to try to validate a document using version A of a schema against version B of that schema. c) The Unique Definition constraint says that the same NCName cannot be used for two definitions or declarations of the same type. However, unparsed entities and parsed general entities occupy a single symbol space in the schema language. Is this correct? I thought these had separate symbol spaces, but I can't find anything in the XML spec that actually states this. Section 3.2 -- The Document and its Root The ability to declare a root element type is useful. For example, an application that reads a schema document might reasonably expect it to start with a <schema> element and consider that document to be invalid otherwise. I think this ability should be added as an option. Section 3.3 -- References to Schema Constructs a) What is gained by the ability to reference a schema by both its abbreviation and its name? Except for showing off typing skills, are there any good reasons to refer to a schema by its name? If not, remove this ability. b) Get rid of the schemaAbbrev attribute and use prefixed names as is done with namespaces. For example, <elementTypeRef name="HTML:BLOCKQUOTE"/>. This is much easier to read and more intuitive for people accustomed to namespaces. The difference in processing cost is not significant. Section 3.4.1 -- Datatype Definition a) Specialization of data types at point of use is too flexible and too likely to cause confusion. Instead, require people to define and use new data types. However, declaring default values at point of use should be retained, as this applies to the use of the type and not the type itself. b) Why is fixed part of the data type qualification? This has nothing to do with data types and should be part of the attribute or element type constraint. c) What issues are there about aggregate data types that need to be resolved? An aggregate data type is simply an element content model. Section 3.4.2 -- Archetype Definition What is a default element value and when is it applied? When the element is empty? When it is missing? If the latter, and more than one instance of the element type is legal, how many are created? My gut feeling is to delete this. Section 3.4.3 -- Attribute Declaration Why should there be a default attribute data type? There isn't now. Section 3.4.6 -- Mixed Content a) The use of a <mixed> element with no children to indicate PCDATA-only content is not intuitive to most users and is likely to lead to confusion. Either: i) Add a <pcdata> element, or ii) Require one or more elementTypeRef's under <mixed>. PCDATA-only content is stated with <datatypeRef name="string"/>. I prefer (ii), as it means there is only one way to declare PCDATA-only content. b) What is the purpose of the NOTE? That is, why is it important to be able to declare PCDATA-only content without using the above datatypeRef? Section 3.4.9 -- Element Type Declarations Locally-scoped element type names break XML 1.0 validity and are probably not worth the confusion they will cause -- remember that most document authors are not programmers and are not likely to understand scoping. I suggest you delete them. Section 4.1 -- Associating Instance Document Constructs with Corresponding Schemas What is the relationship between schemaIdentity, schemaName, and the namespace URI? My guess is that all should be the same, but this is never stated. Section 4.2 -- Exporting Schema Constructs What is the motivation for export control? It doesn't seem applicable to XML. Unlike programming languages, where implementation details can be hidden from the user, everything in an XML document is visible. Saying that I can see an archetype, attribute, content model, etc. but can't use it is just plain silly. It doesn't really mean I can't use the schema object -- it just means that I have to cut and paste instead of using the schema language's handy, built-in referencing mechanisms. Section 4.6 -- Import Restrictions a) The note asks whether imported definitions are re-exported and states that they are not due to difficulties in managing abbreviation associations. I don't understand this -- such difficulties must already be handled. For example, suppose schema A imports element b from schema B, and element b includes element c imported from schema C. If schema B uses the same abbreviation for schema C that schema A uses for schema B, the processor must resolve this today. This is not difficult, as the processor maintains abbreviation lists on a per-schema basis and does its processing based on schemaIdentity/Name, not the abbreviation. Thus, imported definitions should be re-exported. b) The note in section 4.7 asks whether import implicitly imports features not explicitly imported or only imports such features when needed by explicitly imported features. The latter case is the correct one, as it makes the schema author's intentions clear and forces them to import exactly what they want. (It also saves memory in the processor, which only needs to save those schema items needed by the importing schema, rather than the entire imported schema. Whether the memory saved is significant depends on the size of the imported schema and how much was imported.) Section 4.7 -- Schema Inclusion a) Does schema inclusion solve any problems that can't be solved with external entities? If not, delete it. b) If includes are kept, it *must* be an error if identically named items are encountered twice. Using the first definition is a bad practice and open to abuse. (A reasonable compromise would allow multiple, identical definitions.) Section 4.8 -- Access to Schemata Basing a schema's location on its name (namespace URI, schemaName, or schemaIdentity) is a bad idea. A generic schema processor, such as a generic validation module or a schema-driven editor, has only one realistic choice when it comes to locating the schema document, and that is to hope that the URI is a URL and try to resolve it. (I don't believe that schema name servers are going to appear any time soon, if ever.) Unfortunately, this forms a one-to-one relationship between schema names and locations, which precludes multiple copies of the schema. It also means that, in most cases, the processor must be connected to the Web. One possible solution is to separate location information from the schema's name. This is needed in import and include statements within the schema and also in whatever mechanism (not namespace declarations) is used to associate schemas with instance documents. The location information can still be a URI, and the mechanism by which the URI is resolved to an actual location can still be processor-specific, but this does allow generic processors to simply resolve the URI as a URL or fail. Section 5 -- Documenting schemas In considering documentation elements, please consider the following: a) Display names, which can be different from item names. For example, I want an element type of SalesOrder, but I want this to result in a form name of Sales Order. b) Support for multiple languages (English, French, etc.) Section 6.1 -- Schema Validity a) The sentence after the definition of schema-ready for documents states that a document is schema-ready even if it has no namespace declarations. How? The definition of schema-ready for documents states that the document is schema-ready if all of its elements are schema-ready, and the definition of schema-ready for elements states that an element is schema-ready if any of its namespace declarations resolves to a schema. This doesn't seem to cover the case where none of the namespace declarations resolves to a schema or there are no relevant namespace declarations. I think these cases need to be added explicitly to the definition of schema-ready for elements. b) The definition of schema-valid allows partial validation (in the XML 1.0 sense) of documents. While this is undoubtedly useful, many (most?) applications will want full validity (in the XML 1.0 sense). I think you need a definition such as totally-schema-valid, which is the same as schema-valid except that all elements must be schema-governed, and a way for applications to request this. c) In the description of how DTDs and schemas interact, please explain what happens when the DTD and schema conflict. This is (perhaps unintentionally) covered in part by the first bullet, which allows for this case. I think it is OK to simply say that this is the document author's problem. Section 6.2 -- Responsibilities of Schema-aware processors Why is item 6 (exposing the combined information set) required for conformance? TYPOS, ERRORS, NITPICKING, ETC. ============================== Section 1.3 -- Relationship to Other Work Consider mentioning the Fragment spec, which wants to use schema syntax (including entity definitions) to represent DTDs in line. Section 2.4 -- Schemas and their component parts The table states that archetype refinements are named. I assume this is an error, as I see no way to name a refinement (as distinct from an archetype). If there is a way to name a refinement, it should share the same symbol space with archetypes. Section 3.3 -- References to Schema Constructs The Consistent Import constraint states, "A schemaAbbrev or schemaName in a schemaRef must be declared in an Schema Import of the current schema, ..." I think it would be clearer to say "... in the current schema..." "Of" implies that the current schema is being imported somewhere else. Section 3.4.1 -- Datatype Definition The second paragraph states that "datatype[s constrain]...the character data contents of elements". This should be more specific and state that they constrain the character data content of elements that can only contain character data. Clearly, data types do not constrain character data when character and element content is present. Section 3.4.2 -- Archetype Definition Archetypes are really nice, but the name is obscure. How about "base type", "base element type", or "abstract type" instead? Section 3.4.3 -- Attribute Declaration In production [24], required should be followed by a "?". If it is understood to be a choice of required/not required, it needs a production of its own. Section 3.4.4 -- Attribute Group Definition Productions [26], [27], and [29] do not match the DTD in appendix B: a) [26] should be: attrGroupSpec ::= (attrDecl | attrGroupRef)+ exportControl b) [27] should be: attrGroupRef ::= attrGroupName c) [29] should be deleted. (What is its purpose, anyway?) Section 3.4.7 -- Element-Only Content The example states that the default of maxOccur is 1. In fact, maxOccur has no default. Section 3.4.9 -- Element Type Declarations If locally-scoped element type names are retained, two changes are needed: a) At the start of the fifth paragraph, change "An elementTypeDecl may also appear within a modelElt..." to "... within a modelElt or mixed..." b) Clarify section 2.5 with respect to the symbol spaces of attributes and element types; at the very least, simply add cross-references to the relevant explanations (sections 3.4.3 and 3.4.9). Section 3.6 -- Entities and Notations Why are notations included with entities? Entities are a physical construct and notations are a logical construct. Section 3.6.1 & .2 -- Internal/External Parsed Entity Declaration a) Change "internal/external parsed entity" to "internal/external parsed general entity" to make it clear you are defining general entities and not parameter entities. b) In the example in 3.6.1, change "... in instances of the containing schema..." to "in documents that use the schema..." Section 4.1 -- Associating Instance Document Constructs with Corresponding Schemas In the last sentence of the example, "content model" should be "archetype". Throughout Entire Specification a) There are numerous grammatical errors in the use of "a" and "an" -- admittedly minor, but annoying. b) Any chance that "datatype" could be made two words again? Thanks, -- Ron Bourret
Received on Tuesday, 8 June 1999 06:28:23 UTC