- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Sat, 11 Apr 2009 15:46:56 -0600
- To: John Arwe <johnarwe@us.ibm.com>, www-xml-schema-comments@w3.org
- Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
In bug 6009 (http://www.w3.org/Bugs/Public/show_bug.cgi?id=6009), on 2 September 2008, John Arwe wrote: > The following are passages whose interpretation I was unsure of. Thank you for the careful reading and catalog of places where even a technically astute reader may stumble. The wording proposal at http://www.w3.org/XML/Group/2004/06/xmlschema-1/structures.b6009.html (member-only link) shows, in context, the changes I propose to make on the basis of your comments, but many of your comments may benefit from a more direct response, so I am also sending you this email response, with a cc to the XSD comments list. I'll add a pointer to this email from Bugzilla but will not include the entire text there. I've added separator lines and numbers to help the reader navigate and to help myself keep track, as I draft this response, of how far I have progressed through your comment and how far I have to go. This email, and the current state of the proposal mentioned above, cover only the first dozen of your two dozen points; a second installment will be necessary to cover the rest. ---- 1 ------------------------------------------- > 2.2.1.1 Type Definition Hierarchy > "A type defined with the same constraints as its ·base type > > definition·, or with more, is said to be a restriction." > "A complex type definition which allows element or attribute > > content in addition to that allowed by another specified type > > definition is said to be an extension." > I can read these together to say that a single type def may be both > an extension and a restriction, although I know XSD syntax does not > allow that. The obvious case is a "vacuous extension", i.e. one > that adds no new element or attribute content. Yes? Yes. Note added to say this explicitly. ---- 2 ------------------------------------------- > 2.2.1.2 Simple Type Definition > "A simple type definition is a set of constraints on strings and > information about the values they encode, applicable to the > ·normalized value· of an attribute information item or of an > element information item with no element children." > This appears to say mixed=yes => never a simple type def. Yes? It depends on what you mean by mixed content. In common usage, it refers to content which is a mixture of parsed character data and child elements. In that sense, your surmise is correct: if an element instance contains a mixture of character children and element children, it cannot be valid against any possible simple type. More technically, 'mixed content' is often used, in discussions of SGML and XML DTDs, to refer to content models containing the token "#PCDATA" -- whether they require or allow the presence of child elements or not. (To my surprise, I don't find it defined as a technical term in ISO 8879, so I'm glossing it here from memory, not from the spec.) Certain properties of content models and of parsing <behavior depend not on the presence of child elements in the instance but on the presence of '#PCDATA' in the content model. In that sense, your surmise would be slightly askew: a DTD content model of the form '(#PCDATA)*' might well correspond to what in XSD one would declare as a simple type. In XSD itself, the term 'mixed content' is used only twice, once referring to DTDs with what I take to be the sense just given and once generically to the possibility that the children of an element (or a type) might be a mixture of characters (other than whitespace) and children. More generally, the 'mixed' attribute on source declarations for complex type definitions corresponds to a particular value of the corresponding component's {content type}.{variety}. In this context, depending on what exactly your surmise is taken to mean, it may be taken as (a) a category error, (b) a rough approximation but not completely correct, or (c) a simple and true statement. (a) Category error: 'mixed' is a property of complex types (or, since I'm being pedantic: 'mixed' is a possible value of a property of the {content type} of a complex type definition. Simple types have no corresponding or analogous property, so one cannot say "a simple type has mixed=no" any more than one can say "the transmission of a simple type is automatic, not manual". mixed=yes => not a simple type definition, true. But the same is true for mixed=no: mixed=no => not a simple type definition, since mixed does not apply at all to simple type definitions. (b) Rough approximation: where character data appears we may ask "are we dealing here with a simple type or no?" If we are in a context where child elements are also possible in principle, then we are not dealing with a simple type. True enough. But note that from a formal point of view, xs:anyType has mixed content: it allows both child elements and character data. And xs:anySimpleType -- which is a simple type -- is a restriction of xs:anyType. Restriction never adds something that was not already present, at least notionally, so the formal story requires us to say that in some sense all the values and lexical representations associated with simple types are present in xs:anyType (even if for pragmatic reasons processors are not required to identify them to downstream applications). And in that sense I would be reluctant to affirm that mixed=yes => not a simple type def. (c) Simple truth: Complex types may have empty content, simple content, element-only content, or mixed content. A complex type with simple content has an instance of a simple type as its content (i.e. the character sequence found in the input document is a legal lexical representation of a simple type, and maps to a value of that type). For some complex type T, if T.{content type}.{variety} = mixed then T.{content type}.{variety} != simple. If that's what "mixed=yes => never a simple type def" means, then he answer is "yes". Given the complexity of the situation I do not know of a way to address your comment in the spec text without ripping out section 2 and starting over. That might be useful in making the text easier to understand, but it would probably delay us by more than a day or two and prevent XSD 1.1 from ever becoming a W3C Recommendation, so I am loath to undertake the effort. If there is a simple change to make here that would have made this paragraph seem less confusing, I'll be happy to make it, but I have not yet found one. ---- 3 ------------------------------------------- > 2.2.2.1 Element Declaration > "...by triggering identity-constraint definition ·validation·." > My brain thinks you are calling out 'i-c def validation' as a > special term, but the usual presentation evidence of that (dots on > either side of a link) is absent. We don't currently define 'identity-constraint definition validation' as a term; to try to set your brain at rest I have added a cross reference to the section on identity-constraint definitions. ---- 4 ------------------------------------------- > 2.2.2.2 Element Substitution Group > > "...name and content of an element must correspond exactly to > the element type referenced in the corresponding content model." > Seems to a novice reader equivalent to saying "to the governing type > decl". If so, using that term _might_ be clearer even though it's a > forward reference. Alterntively, their equivalence could be noted if > it is in fact true. Actually, this sentence is referring to XML DTDs, and is using the term 'element type' in a way familiar to DTD-oriented people, but perhaps less to others. I've revised it to read: When XML vocabularies are defined using the document type definition syntax defined by [XML 1.1], a reference in a content model to a particular name is satisfied only by an element in the XMNL document whose name and content correspond exactly to those given in the corresponding element type definition. Note: The "element type" of [XML 1.1] is not quite the same as the ·governing type definition· as defined in this specification: [XML 1.1] does not distinguish between element declarations and types as distinct kinds of object in the way that this specification does; the "element type declaration" of [XML 1.1] specifies both the kinds of properties associated in this specification with element declarations and the kinds of properties associated here with (complex) type definitions. ---- 5 ------------------------------------------- > 2.2.2.2 Element Substitution Group > "...Through the new mechanism of element substitution groups, " > New? It was in 1.0. I realize via further reading it has changed > (multi-head now allowed) but that seems like "improved" not "new". > If the attempt was to distinguish it from "substitution groups", > sans "element", I don't think it does so. It seems to be hard for the spec to realize that the language it defines is no longer the new kid on the block. The word 'new' was true when it was written, as part of the text of 1.0. I've deleted it now. ---- 6 ------------------------------------------- > 2.2.4.2 Type Alternative > "A type-alternative component (type alternative for short) > associates..." The parenthetical seems to be here only for this > component type. Seems like it should be done consistently (all or > none). I think it's motivated by the thought that a 'definition' or a 'declaration' is more clearly and obvious part of a schema than is an 'alternative'. When we use the phrase 'type definition' instead of 'type definition component', few people outside the paper industry and the occasional very careful logician are disappointed or confused; The same did not seem to us to be true when we introduced the type alternative component and the phrase 'type alternative' to refer to such components. Hence the careful explanation. Of course, the language sense of the XML Schema WG is affected by our long involvement with the material. I doubt that we are wrong in thinking that we need to explain that 'type alternative' is just short for 'type alternative component'. But are we perhaps wrong in thinking 'type definition' is not clear to a fresh reader as shorthand for 'type definition component'? If you tell me we are, I'll happily insert similar parentheticals throughout section 2. (Well, not happily. But I won't complain where you can hear me.) But I won't take the time solely for the sake of a consistency whose value does not seem obvious to me. ---- 7 ------------------------------------------- > 3.3.2.1 Common Mapping Rules for Element Declarations - XML Mapping > Summary clause 2 > "2 otherwise (the <alternative> has a test) a Type Alternative > with the following properties: Property {test} Value ·absent·." > <alternative> HAS a test, {test} value is ABSENT. ??? I've recast the rule in an attempt to make clearer what is going on here. The schema author can specify the {default type definition} of a type table in either of two ways: if the sequence of <alternative> elements ends in an <alternative> without a 'test' attribute, that last 'alternative' is taken as specifying the {default type definition}: it is as if the default test were "1 eq 1". If the final <alternative> does have a 'test' attribute, it's taken to be a normal alternative like the others and handled by the rule for {alternative} immediately above the passage quoted. In that case, the element declarations declared type is used as the {default type definition}. The wording quoted is correct, even if your puzzlement is understandable. If the final <alternative> element has no test, then the {default type definition} is constructed from it; otherwise the {default type definition} has nothing to do with the final <alternative> and is constructed with an absent {test}. The 'test' attribute in the final <alternative> is not lost or ignored -- it turns up as the {test} property in the last of the {alternatives}. The rule now reads: {default type definition} Depends upon the final <alternative> element among the [children]. If it has no test [attribute], the final <alternative> maps to the {default type definition}; if it does have a test attribute, it is covered by the rule for {alternatives} and the {default type definition} is taken from the declared type of the Element Declaration. So the value of the {default type definition} is given by the appropriate case among the following: 1 If the <alternative> has no test [attribute], then a Type Alternative corresponding to the <alternative>. 2 otherwise (the <alternative> has a test) a Type Alternative with the following properties: Property Value {test} .absent. {type definition} the {type definition} property of the parent Element Declaration. {annotation} the empty sequence. The only change is the insertion of the explanatory sentence "If it has no ..." ---- 8 ------------------------------------------- > 3.3.1 The Element Declaration Schema Component > FYI: The two paragraphs beginning with "Element declarations are > potential members of the ·substitution groups·," are pretty hard to > actually understand (the first more than the second, but the first > depends on the second so they are linked). I've suggested we recast this: The {substitution group affiliations} property of an element declaration indicates which substitution groups, if any, it can potentially be a member of. Potential membership is transitive but not symmetric; an element declaration is a potential member of any group named in its {substitution group affiliations}, and also of any group of which any entry in its {substitution group affiliations} is a potential member. Actual membership may be blocked by the effects of {substitution group exclusions} or {disallowed substitutions}, see below. ---- 9 ------------------------------------------- > 3.3.4.3 Element Locally Valid (Element) > Validation Rule: Element Locally Valid (Element) clause 1 > When D and E both have namespace values of "absent", clause 1 seems > to output "never valid". Is that that intent, do I mis-read? The Namespaces spec says (in the passage linked to by the hyperlink): [Definition: An expanded name is a pair consisting of a namespace name and a local name. ] If we allow the namespace name to be absent (as indeed both Namespaces and XSD do, with the phrases 'have no value' and 'have the value .absent.', respectively), it seems inescapable at least to me that the pair (a, .absent.) and the pair (a, .absent.) are identical. So yes, I think you are misreading this clause. Would it help if the clause read not 1 D is not ·absent· and E and D have the same expanded name. but 1 D is not ·absent· and the expanded names of E and D match. with 'match' being a hyperlink to the definition of 'match' for expanded names (in section 3.9.4.1.2 Validation of Basic Terms)? The definition says, roughly that two expanded names match if they are the same expanded name (and thus, by some lights, not two expanded names at all)? My instinct is not to change the text, since I think the current formulation is simpler, but I can be persuaded or outvoted. In the wording proposal, this change is marked not-status-quo to distinguish it visually. --- 10 ------------------------------------------- > 3.3.5.1 Assessment Outcome (Element) > "...with a [schema information] property..." > FYI: Since I read this front to back, at this point I had not seen > anything to tell me that 1.1 was introducing new properties, so this > confused me. It eventually became clear of course. I wonder if a > link or definition is warranted for new chunks like this. I'm not sure I understand. Neither [validation context] nor [schema information] are new properties introduced by XSD 1.1; both are taken over without change from 1.0 (except that XSD 1.1 makes explicit that the [validation root] can be an attribute, which 1.0 passes over in silence). The upshot is that I don't know what confused you here and can't attempt to fix it. --- 11 ------------------------------------------- > 3.3.5.2 Validation Failure (Element) > FYI: By this point, I figured out that you were defining new PSVI in > some of the []'s since I saw the definition before the usage. > [schema error code] got me to asking questions about its type > (string? qname?) that I realize now I never asked about the PSVI > properties I grew up with, so I'm not sure if those questions are > actually fair. It does seem that there might be some value in > making the error codes Qnames, to enable Schema processors invokers > to clearly distinguish between "official standard" error codes and > additional (potentially more informative) codes provided by the > schema processor impl. > I have heard folks operating in the business layer complain that > standard schema error messages are inadequate generally to tell a > user what in the instance is wrong, and therefore they use > Schematron etc to pre-process instances and issue more > domain-user-friendly messages. At one point, the XML Schema WG intended to revamp the error codes of the spec, which seem to some readers to have a number of shortcomings (different readers, of course, identify different flaws, but they don't actually cover all possible problems, they don't seem to be orthogonal (failure to satisfy one clause of one constraint may necessarily entail failing to satisfy a different clause of a different predicate -- which code should be used? both?), and the idea of ensuring that error codes can easily be hyperlinked to the relevant rule in the spec co-exists uneasily with the claim sometimes made that the spec is not intended to be comprehensible to naive users (only to writers of schema processors) and the observable fact that (independently of whether it should be or not) the spec is not written in such a way as to make it useful to end users seeking to find and fix problems in their data. See bug 2843 http://www.w3.org/Bugs/Public/show_bug.cgi?id=2843 See also 2165 http://www.w3.org/Bugs/Public/show_bug.cgi?id=2165 Unfortunately, as our resources and time have grown short, it has become clear that we do not have the capacity to perform the front-to-back re-analysis of the spec that would be involved in defining a new set of error codes. You are doubtless right that Schematron's ability to customize error messages helps make it more useful for end users; I believe that the initial design of the XSD error code system assumed that there would normally be some layer between the validator and the end user which could interpret the error code and give the user a useful message. The fact that there don't seem to be many such layers may suggest that the current set of error codes are not structured in a way that lend themselves to exploitation by such an intermediate layer. On the concrete question of the type of schema error code -- like other parts of the PSVI, the [schema error code] is an abstract label for some bits of information. The spec defines no types for any of them, neither in terms of programming-language types nor in terms of XSD types or XML elements and attributes. There was some interest in an API for XSD, but there was also substantial opposition from some WG members who did not wish to see W3C standardizing APIs ("We don' need no steeenking APIs" was the way one WG member put it to me, privately) and some development teams appear to have concluded that the description of the PSVI itself could suffice as an API, although it was not designed with that in mind and interpreting it as an API specification violates the essential premise of calling it an "information set" rather than an "API" or "document format". There has also been some inhterest in XML representations of the PSVI, but nothing remotely resembling consensus; several proposals have been floated, and those who like one proposal generally regard the alternative proposals as unspeakably ugly, complicated, inadequate, or baroque, reflecting very badly on the taste or technical acumen of their designers. It's not the kind of reaction that encourages an effort to get all the designers together to seek a meeting of the minds. --- 12 ------------------------------------------- > 3.3.5.2 Validation Failure (Element) > "Note: If more than one ... fails to be satisfied," applies equally > well to [schema error code], no? In principle, no. The PSVI is an abstract account of some of the information generated during an assessment and with the exception of properties like [failed assertions] and [failed identity constraints] it is intended to be invariant, or as nearly so as possible. So [schema error code] is supposed to contain / is defined as containing codes for every error in the element or attribute instance it's attached to. Some validators will expose only a subset of the PSVI, of course, but XSD 1.1 attempts to be clear that what happens in such cases is that the validator is exposing part of an abstract set of information which is in principle all always present, and not (for example) that the PSVI varies with the processor's choice of API. (XSD 1.0 vacillates between these views unhelpfully, particly because it keeps falling into the error of confusing information sets with APIs.) Of course, if you are only going to expose the [validity] and [validation attempted] properties on the validation root, it can be a helpful optimization to stop validating as soon as you know what those values are going to be. If there are three local validity errors on the validation root, as well as an invalid descendant, you won't get them all if you stop on the first error. But in that case, strictly speaking, you aren't exposing [schema erorr code], just the part of it you calculated (and in principle at least your documentation should say so). The [failed assertions] and [failed identity constraints] are defined differently, on the theory that even in the abstract a knowledge of all assertions which fail to be true need not be part of the PSVI. In drafting this response, I have come to believe that this is unmotivated by any design principle and is merely a relapse into the same mistaken view of information sets that is visible in parts of XSD 1.0. So I have proposed to replace the Notes you refer to with different notes that put the proposition differently. For assertions: [failed assertions] A list of Assertions that are not satisfied by the element information item, as defined by Assertion Satisfied (§3.13.4.1). Note: In principle, the value of this property includes all of the Assertions which are not satsfied by this element item; in practice, some processors will choose not to check further identity constraints after detecting the first failure. Such processors will expose a subset of the items in this value, rather than the full value. And analogously for identity constraints. This is a slightly vexed question (there are reasons that the WG keeps falling into the mistake of viewing the PSVI as an API). So while I will thank you for making me aware of this problem, you should be aware that others in the WG may not thank you for bringing this topic to the fore again. -- **************************************************************** * C. M. Sperberg-McQueen, Black Mesa Technologies LLC * http://www.blackmesatech.com * http://cmsmcq.com/mib * http://balisage.net ****************************************************************
Received on Saturday, 11 April 2009 21:47:38 UTC