- From: Martin Duerst <duerst@w3.org>
- Date: Mon, 07 Jul 2003 11:41:25 -0400
- To: public-qt-comments@w3.org
- Cc: w3c-i18n-ig@w3.org
Dear XML Query WG and XSL WG, Below please find the I18N WGs comments on your last call document "XQuery 1.0 and XPath 2.0 Data Model" (http://www.w3.org/TR/2003/WD-xpath-datamodel-20030502/). Please note the following: - Please address all replies to there comments to the I18N IG mailing list (w3c-i18n-ig@w3.org), not just to me. - All i18n-relevant comments are marked with ***. There are also general comments on the spec which we hope you will find useful. - We have not yet reviewed the other documents, such as XQuery 1.0 or XSLT 2.0, and so we might be unaware of i18n issues that appear in these specs but may have to be traced back to the data model. There are also cases where we have identified an i18n issue here, but we are not sure exactly what the best solution will be. - Our comments are numbered in square brackets [nn]. We look forward to further discussion with you. General: [1] In general, this is a very extensive and rather boring document. Where possible, it should be shortened and compacted, to make it easier to get the relevant points. [2] There are mappings between the following different things: - properties of nodes of the data model - the corresponding accessors - the mapping from the PSVI (post-schema validation infoset) to the properties - the mapping from accessor output to XML infoset properties (in the other draft we are reviewing, there are also functions corresponding to the accessors). At least one of these could easily be removed (e.g. the properties or the accessors). [3] 1. Intro: 'stylesheet or query' should be replaced by 'transform or query' [4] 2. expanded-QName: Does this allow to handle special cases such as XSLT that transforms XSLT, or XQuery that queries XQuery,...? [5] 3.2 Document order: "The relative order of nodes in distinct documents is implementation-dependent but stable. In other words, given two distinct documents A and B, if a node in document A is before a node in document B, then every node in document A is before every node in document B. The second sentence sounds like a corollary from the first, but is a non sequitur. It could as well be that an implementation decides to order first all the first nodes from all the documents, then all the second nodes, and so on. If indeed all nodes of one document have to be before all nodes of another document, that should be said explicitly, and not only as 'in other words'. [6] 3.3, markup of [Definition]. Using square brackets for indicating definitions doesn't look good at all. Also, there should not be a period before and after the closing ]. [7] *** 3.3 data model support of values that are not supported by the XML Infoset: What about pcdata with an associated language information? What about document fragments with associated inherited attributes in general? RDF is dealing with such things, and it would be very good if they could be handled. [8] *** The handling of inherited attributes in general is an important issue for I18N (because of xml:lang) that wasn't dealt with at all in XSLT 1.0. Apart from what may be needed in the data model, support is also important on a higher level. [9] 3.3 "The data model supports incompletely validated documents, but inconsistent data models are forbidden." What is an inconsistent data model? What actually happens when there is such a model? Does an error get thrown? [10] 3.4 "In either case, the type names must also appear in the In-scope Schema Definitions (as defined in [XPath 2.0]) available to the processor." 'type name' or rather 'type definitions' or 'types'? There are anonymous type names, but it seems strange to say that these appear somewhere. [11] *** anyType, anySimpleType, anyAtomicType, untypedAtomic, string, text nodes: This is a very general concern, but very important for internationalization. There seems to be a proliferation of type variants dealing with the simplest of things in XML, namely simple text. This seems to ruin quite a bit of the benefits of using Unicode; now that we have solved the character encoding problems, we don't want to create arbitrary differences for simple pieces of text. But various specs (e.g. also RDF) seem to come up with additional ways of creating arbitrary differences. anyAtomicType and untypedAtomic seem to be badly explained and justified. We have to make sure that whenever possible, there is no arbitary boundaries in functionality. Rather than treating string, text nodes, and untyped as three completely different things, they should work as much as possible in an overloaded way similar to the number operators. [12] *** 3.6.1 date and time mappings: for things with timezones, canonicalizing the time zone and then representing the original time zone separately seems to make sense. But for values without a timezone, representing them as if they were in UTC is inherently wrong and will lead to a lot of misunderstandings. (having things with timezones and things without timezones as separate types would have been the better solution originally, and maybe it's still not too late for that) [13] 3.6.1, editorial: "Lexical representations that do not have a timezone are assumed to be in UTC for the purposes of normalization." -> "Lexical representations that do not have a timezone are assumed to be in UTC for the purposes of normalization ONLY." [14] 4.1.6 typed-value: It would be good to have some explanation of what the idea/purpose of this accessor is. It seems to be strange that some cases produce errors. Why does mixed content produce a string, but complex content, a subset of mixed content, produce an error? [15] *** 4.1.8 children Accessor: "The sequence of children will never contain adjacent text nodes." (see also 4.2.1) It is good that text nodes are always merged. But this should be stated as a property of the data model, not just mentioned in an accessor description. 4.2.1 "The children must consist exclusively of element, processing instruction, comment, and text nodes if it is not empty. Attribute, namespace, and document nodes can never appear as children" [16] - 'if it is not empty' seems irrelevant, obviously an empty document won't contain any nodes of other types either. [17] - There should be a period at the end [18] 4.2.1, "Implementations that support DTD processing and access to the unparsed entity accessors, use the unparsed-entities property to associate information about an unordered collection of unparsed entities with a document node." spurious comma [19] 4.2.2 typed-value: why does document return the string value, but any of its elements could return an error? [20] *** 4.2.2 and many other places: As far as we understand from previous discussions, xs:string is often used instead of xs:anyURI for convenience (to avoid additional casts). It is important in these cases to clearly state that the values actually have to be anyURIs, AND are treated according to anyURI syntax. [21] *** 4.2.4: [character encoding scheme]: "The values of these properties are implementation-defined but must be consistent with the rest of the Infoset constructed." What does 'consistent' mean here? There is a dependency between non-ASCII element/attribute/... names and the encoding chosen. But for a data model that produces an infoset that is (not yet) intended for serialization, it almost seems that any specific value would be inappropriate. On the other hand, when actually being written out, at least for XSLT, the property is not implementation-dependent, but determined by the <output> element. So we suggest the following text: "irrelevant during processing, determined by XQuery or XSLT for output" [22] *** 4.3.1: processing instructions and comments: Is there a way to ignore these (if not in the data model, then in XQuery and XSLT?) Because they are not part of the actual text, ignoring them is often desirable. In that case, the text nodes should merge automatically. [23] 4.3.2 "If the element node's type is xs:anyType, the dm:typed-value accessor returns the node's string value as xs:anySimpleType. If the type is a complex type with complex content, invoking dm:typed-value raises an error." Doesn't anyType include complex types? [24] 4.3.2: One additional accessor: Why is this accessor not listed in the table? [25] 4.3.3: Ale xml:base attributes treated as special attributes or like namespace declarations? [26] 4.4.1: "Attribute nodes encapsulate XML attributes": 'represent' may be better than 'encapsulate'. [27] 4.4.2: The details about typed-value are useless duplications. It would be better to specify this very clearly in one single place, and just point to it from other places. [28] 4.4.3: "The xs:QName IS computed..." [29] 4.4.4: [owner element] -> [parent] [30] ***4.5.1: uri -> anyURI (or an equivalent explanation) [31] ***4.8.3: "The string-value is not W3C normalized as described in the Character Model for the World Wide Web version 1.0 draft." This may be misunderstood that the string value has to be non-normalized. It should at least be clarified as follows: "The string-value is not necessarily W3C normalized as described in the Character Model for the World Wide Web version 1.0 draft. It is the responsibility of data providers to provide appropriately normalized text, and the responsibility of programmers to make sure that operations do not de-normalize text." Even better clarification, in particular of the first sentence, is highly desirable, to clearly say that this refers to a state, and not an action. [32] 5. "The values of nodes whose type is derived by union from an XML Schema primitive type are represented by a sequence of atomic values each of whose type is one of the individual types from the union. The union type information is lost and only the specific types of each individual item is retained." this seems to apply to lists of unions, or maybe unions of lists, but not to simple unions. This should be clarified. [33] 5. "Using the canonical lexical representation for atomic values may not always be compatible with XPath 1.0.": Please say when this is not the case. D. Example: [34] *** xml:lang should be used in the instance, not only appear in the schema (and in the schema be allowed higher-up so that it can be inherited) [35] *** Defining a default currency in the schema is bad design practice. Without the schema, the data is basically useless. Please choose something different for an example of default attribute handling. [36] *** The monetaryAmount type works well for some currencies (USD, EUR,...), but does not work for others (Yen,...). Please generalize. The number of fractional digits needed currently is 0, 2, or 3. for details, please see: http://www.bsi-global.com/Technical+Information/Publications/_Publications/t ig90x.doc [37] *** The pop-culture example may make it difficult for non-native readers to understand the example, or to create a reasonable translation. [38] - "Literal strings are shown without the xs:string() constructor" this should say that strings are shown in quotes [39] - Why are N1-N5 before P1 and E1? [40] - A4: why is typed-value xs:token? [41] - typed-value of E5: inconsistent. [42] - other inconsistencies include: children(E5)->T2, string-value(A7), (A8), (A9), (A10), (A11) (string and typed values seem out of sync) [43] - Graphic representation of the data model. [large view]: This should be provided in SVG Regards, Martin.
Received on Monday, 7 July 2003 11:42:55 UTC