- From: Peter Fankhauser <fankp@darmstadt.gmd.de>
- Date: Wed, 23 May 2001 11:10:13 +0200
- To: <www-xml-schema-comments@w3.org>
- Cc: <w3c-xml-query-wg@w3.org>
Hi, find enclosed the XML Query Working Group review of XML Schema: Formal Description W3C Working Draft, 20 March 2001 http://www.w3.org/TR/2001/WD-xmlschema-formal-20010320/ Best regards, Peter Fankhauser ---- General Comments: Formalizing XML-Schema Part 1 (structures) is a daunting endeavour. The editors have done a great job in formalizing some of the key issues arising in the deployment of XML-Schema Part 1: (a) normalized universal names for schema components (b) a reasonably concise syntax for XML Schema to facilitate formalization (c) a formal characterization of document validation I've reviewed the document from the perspective of an editor of the XML Query Formalization WD (in short XQF), checking to which extent the XML Schema Formalization WD (in short XSF), provides a path to fully align the XQF's current type system with that of XML Schema. Some of the most pressing open issues from this perspective are: (1) Alignment of type system: how can the XDuce inspired type system of XQF be aligned with XML Schema? In particular: (a) how can we repair the undifferentiated notion of a "type variable" in XQF? That is: in XQF "type variables" are used for three different purposes: (a.1) element declarations: type Bib = bib[book[]] (a.2) complex type definitions (which may recurse) type Part = Complex | Simple (a.3) model groups type Book = book[]{1,*} (which may not recurse) (b) how can we reflect the component-model of XML Schema? Section 3 in XSF goes a long way in the right direction; Moreover, Section 8 in the latest version of XSF/ May 1 (http://www.w3.org/XML/Group/xmlschema-current/formalization/formaldesc.html ) describes a mapping from XML Schema components to XSF-sorts. Thus it appears that by adopting XSF sorts (maybe modulo a few (coordinated) syntactic adaptations) the XQF query type system can be well aligned with with the XML Schema components. (2) Type subsumption: what is a suitable, comprehensive characterization of type subsumption which takes into account: (a) type names (b) the type derivation hierarchy of XML Schema (c) element substitution groups (d) structural subsumption of model-groups and of elements with an anonymous type as content. The XML Query WG needs this to formalize subtype substitutability in functions and the static semantics of explicit type declarations. Currently, the XQF notion of subtype only takes into account (d), and ignores (a),(b), and (c). Conversely, the XML Schema notion of "subtype" only takes into account (a) through (c). More formally, XQF's notion of subtype amounts to: type1 subtypeOf type2 iff every instance valid for type1 is also valid for type2. While XSF does formally characterize validation which is aware of (a) through (c) in Section 6 (Document Validation), it does not give a constructive method for deciding about type1 subtypeOf type2. XQF needs such a constructive method and the two groups should coordinate its design. (3) Referring to types in the XML Query (XML Path 2.0) Datamodel The NUN's (normalized universal names) of XSF are great (at least in their unabbreviated form), and the discussion in Section 2.3 shows rather clearly that ref(type) in the Datamodel can indeed be realized by NUNs. So much for the (rather) good news. The bad news is that the document is not really easy reading, although for the most part the editors have done a great job in structuring the document well, and in motivating and explaining the methods and concepts. Nevertheless, I'm afraid that few readers will really fight their way through, which is unfortunate - they really miss sth. Partially, this difficulty is certainly due to the inherent complexity of the problem. However, in some parts the presentation could be improved: (a) more mnemonic names for non-terminals rather than one- or two- letter categories. (b) less creativity in inventing new syntax everywhere (e.g. for documents) (c) more disciplined use of special characters (the document can be rendered in Amaya; but (e.g.) IE 5.x, and the Acrobat Distiller rendering for HTML fail miserably. I hope that some of the more detailed comments below can help to improve readability here and there. Most of the comments are editorial, some of them are about the content. I've tried to prioritize them into minor/medium/major. Detailed Comments: Sec 1/Par 5 (editorial/minor) ----------------------------- I wondered a while about "context free grammar", until it dawned on me that this means "context free grammars as opposed to the XML-syntax of XML-Schema", rather than "XML-Schema as a context free grammar". Sec 2.1 (content/medium) ------------------------ I like the unabbreviated syntax for NUNs, inspired by axis in XPath 1.0. I don't like the abbreviated syntax, because it overloads .../foo/Foo alternating between foo as element-name and Foo as type-name. In addition, the different meaning of * (wildcard in XPath) and anonymous type in NUNs is a bit confusing. Here are few alternatives: The abbreviated syntax could either not abbreviate the type-axis type::u/d/type::*/a or use a seperate abbreviation for type-names, e.g., %u/d/%*/a N.b.: for a while I thought type::* for anonymous types should be avoided altogether, but this didn't survive closer inspection. Sec 2.2.1 (editorial/minor) --------------------------- The abstract syntax a[g] for attribute with name "a" is a bit misleading (without having gone into the details about sorts in Section 3.5). You may consider to defer a detailed exposition of component content to 3.4. The example in 2.2.2 is helpful and should stay; although an example with some "meaningful" names would work even better. Sec 2.3 (editorial/medium) -------------------------- Some parentheses for the normalized elements with type information added would improve readability: a[ u types ( t/@b ... ...] ) ] Maybe one can do without a special syntax for documents (and forests(?)) entirely: <a xsi:type="u", %t/@b=(xsi:string)"zero", %t/@c=(%s)"1 2"> <%u/d xsi:type = "%u/d/%*"> <%u/d/%*/a xsi:type="xsi:string">three</a> <%u/d/%*/a xsi:type="xsi:string">four</a> </u> </a> This extends the XML 1.0 syntax as follows: (1) use NUNs in start-tags and attribute names (2) annotate attributes with their type; by a leading "(type)" (or "{type}", or ...) This also illustrates the effect of schema validation on an XML-document to XML 1.0 afficionados. They may not like it, but at least they may understand it then, and continue to "watch the bits on the wire". Sec 3.1 (editorial/minor) ------------------------- The introduction of special "name classes" (a,e,t) for three symbol spaces, (not to mention s,k,x..) accomplishes brevity, but impedes readability. One might consider to use more mnemonic abbreviations for non-terminals, and avoid "name classes" by using unabbreviated syntax. Sec 3.4 (content/medium) ------------------------ I wonder whether we don't also need a production: g ::= (g) Sec 3.6 (content/medium) ------------------------ I wonder whether "element groups" are allowed to contain "type names", and thereby also choice, sequence, etc. of "type names". Shouldn't this be model group names? Sec 3.8 (editorial/minor) ------------------------- The use of "in" for expressing instance d has type g conflicts with the use of "in" in Section 4. Sec 4, General (editorial/minor): --------------------------------- The special character for "=>" is not rendered on IE5.x. (I substituted it with "normalizesTo" in my local copy). I couldn't find a usage of the notation "x notin deref()". I also don't understand why this isn't "x notin dom(deref())". Sec 4, Rule for "Extend Attribute Transitive" (editorial/medium): ----------------------------------------------------------------- x<:y is not yet defined. Please refer to Section 5 in the explanation. Sec 4, Rules for "Extend Attribute Base" and "Extend Element Base": ------------------------------------------------------------------- (content/medium) I don't quite understand the mechanics of these rules. Are "e" and "a" already NUNs? Sec 4, Rule for "Constant:" (editorial/minor): ---------------------------------------------- where does the prime in "c'" come from? Sec 4, Rule for "Untyped Element:" (editorial/medium): ----------------------------------------------------- this rule is a toughie. Some explanatory prose would help here. Sec 5.1 (editorial/minor): -------------------------- "x <: :x2" should say "x <: x2"? Sec 5.3 (content/major): ------------------------ The model-theoretic definition of restriction needs to be elaborated by a constructive/algorithmic definition. Here's a start (not taking into account interleaving, modelgroup names, attribute group names, groups in parentheses, mixed content) Empty Sequence: ----------------- eps <:_res g{0,n} Empty Choice: ---------- 0 <:_res g Sequence 1: g1 <:_res g1' g2 <:_res g2' ----------------------------- g1,g2 <:_res g1',g2' Sequence 2: g1,g2 <:_res g1' or g1,g2 <:_res g2' ------------------------------------ g1,g2 <:_res g1' | g2' Sequence 3: g1 <:_res g{m1,n1} g2 <:_res g{m2,n2} m1+m2 >= m, n1+n2 <= n -------------------------------------- g1,g2 <:_res g{m,n} Choice: g1 <:_res g g2 <:_res g ---------------- g1 | g2 <:_res g Repetition 1: g{m1,n1} <:_res g1 g{m2,n2} <:_res g2 m1+m2 >= m, n1+n2 <=n --------------------- g{m,n} <:_res g1,g2 Repetition 2: g{m,n} <:_res g1 or g{m,n} <:_res g2 ------------------------------------ g{m,n} <:_res g1 | g2 Repetition 3: g <:_res g' m1>=m2 n1<=n2 --------------------------- g{m1,n1} <:_res g'{m2,n2} Attribute: g <:_res g' ----------------- a[g] <:_res a[g'] Element: g <:_res g' ----------------- e[g] <:_res e[g'] N.b. 1: This does not take into account substitution groups and wildcards. N.b. 2: This does also not take into account cases where the content-type g is derived by extension from content type g'. That would be as follows (but I'm not sure whether we want that): e <: e' g <: g' ------------------ e[g] <:_res e'[g'] Sec 5.4 (content/medium): --------------------------- I think the rule should say ("der in deref(x').derivation" instead of "der = deref(x').derivation", and x' needs to be "wellformed" as well. |- x' x' = deref(x).base der in deref(x').derivation deref(x).content <:_der deref(x').content ----------------------------------------- |- x Sec 6.1, Par 1 (content/major): ------------------------------- Are the documents to be validated already in normalized form or not? According to the last par in Sec 6.1. they can be both. What is the processing model behind this? When validating a document, does one first normalize, then validate, or vice versa? Sec 6.1, Rule "Typed Attribute" (editorial/minor): -------------------------------------------------- The rule should be: d in s -------------------- a[s types d] in a[s] Sec 6.2 ------- I have not reviewed this Section... Sec 7, Par 3 (editorial/medium): -------------------------------- I don't understand: "d --> g (eg writes as g)" "f" is probably the description fragment? Sec 7.1 (editorial/medium): --------------------------- What is "x" in all rules? Where does it come from, what does it contain? Generally, the mapping rules came out so badly in my printed version that I did not review them in detail (but I think I got the general idea...)
Received on Wednesday, 23 May 2001 05:03:15 UTC