- From: Felix Sasaki <fsasaki@w3.org>
- Date: Wed, 27 Apr 2005 21:14:09 +0900
- To: public-i18n-core@w3.org
Dear all, Today I went through some of the reviews from Martin on the xquery-suite, and through the current state of the documents. Below you will find the result. I have done "XPath 2.0", "XSLT 2.0 and XQuery 1.0 Serialization", "XML Syntax for XQuery 1.0 (XQueryX)", "XQuery 1.0: An XML Query Language." It would be great if you could go through the comments and make suggestions for corrections, before we send them to the xquery wg. Tomorrow I will start with the data model and Martin's comments. Best regards, Felix. ------------------------------------------------------------ Name of specification: XML Path Language (XPath) 2.0 Document: http://www.w3.org/TR/2005/WD-xpath20-20050404/ Main reviewer: Felix Sasaki (fsasaki@w3.org) ------------------------------------------------------------ Comments: [1] General comment on references to URIs: Throughout this and other specs, please reference IRIs (RFC 3987, http://www.ietf.org/rfc/rfc3987.txt) instead of URIs. You often refer to the XML Schema data type xs:anyURI, e.g. "The URI value is whitespace normalized according to the rules for the xs:anyURI type in [XML Schema]." (sec. 2.1.1), but this data type itself is in its latest version defined in terms of IRI. Referring to IRI directly in your specification would make things clearer. [2] Sec. 2.1.2, Definition of an "Implicit time zone" (http://www.w3.org/TR/xpath20/#eval_context). This has to be removed. Using implicit conversions between timezoned and non-timezoned dates and times is way too prone to all kinds of subtle and not so subtle bugs. [3] Sec. 2.2.3.1, "The operation tree is then normalized ..." (http: //www.w3.org/TR/xpath20/#id-static-analysis). There are many different normalizations in this series of specifications, like operation tree normalization in this section, white space normalization (sec. 3.10.2, 4c), Character normalization (Charmod, NFC etc.), normalization as described in the formal semantics document, sec. 3.2.1, point 3, and sequence normalization as described in the serialization specification, sec. 2. These should be very clearly distinguished and labeled. A section which summarizes the various kinds of normalization would be helpful. [4] Sec. 3.5.1 (http://www.w3.org/TR/xpath20/#id-value-comparisons). The value comparison relies on atomization of the values; if these are nodes, the atomized value is returned as a typed value. You should make clear that this is quite different from the comparison of string values. This difference might be important for some i18n applications. Consider the following example: <myEl1>bla<myEl2>Š</myEl2></myEl1> if there is a schema which declares the type of myEl2 as empty, Š would not be part of the PSVI and the result of $myDoc/myEl1 eq "bla" would be true, otherwise it would be false. [5] References: The reference to ISO/IEC 10646 should be updated to the newest version, i.e. ISO/IEC 10646:2003. ------------------------------------------------------------ Name of specification: XSLT 2.0 and XQuery 1.0 Serialization Document: http://www.w3.org/TR/2005/WD-xslt-xquery-serialization-20050404/ Main reviewer: Felix Sasaki (fsasaki@w3.org) ------------------------------------------------------------ Comments: [1] Sec. 2, point 3 (http://www.w3.org/TR/xslt-xquery-serialization/#serdm). "each separated by a single space": Inserting a space may not be the right thing, in particular for Chinese, Japanese. Thai, ... which don't have spaces between words. This has to be checked very carefully. [2] Sec. 3, serialization parameter 'encoding' (http://www.w3.org/TR/xslt-xquery-serialization/#serparam). Given that this is already required for the XML output method, we think it's highly desirable to make the requirement for support for UTF-8 and UTF-16 general (including text output). [3] Sec. 3, 'encoding'. Here or for each individual output method, something should be said about the BOM. As for the byte-order-mark parameter in sec. 3, you say "If the concept of a Byte Order Mark is not meaningful in connection with the value of the encoding parameter, the byte-order-mark parameter is ignored." We think in sec. 3 or for each output method you could elaborate "meaningful" to the following: - XML/XHTML: UTF-16: BOM required; UTF-8: may be used. - HTML/text: UTF-16: BOM recommended; UTF-8: may be used. [4] Sec. 3, 'encoding'. The respective sections for the individual output methods (5.1.2, , 6.1.2, 7.4.2, 8.1.2) should say that for UTF-16, endianness is implementation-dependent (or implementation-defined). [5] Sec. 3, 'encoding'.The respective sections for the individual output methods (5.1.2, , 6.1.2, 7.4.2, 8.1.2) should say that, in absence of an 'encoding' parameter, there should be a default of UTF-8. [6] Section 3, 'include-content-type'. Please explain in more detail in this section or in the sections for XHTML (6.1.13) / HTML (7.4.13) why this parameter is necessary. It seems that it may be better to always include a respective <meta> element in XHTML / HTML. [7] Sec. 4, point 2a (http://www.w3.org/TR/xslt-xquery-serialization/#serphases). You define URI-escaping in terms of XLINK. We propose to refer to section 3.1 of the IRI specification (RFC 3987) instead, because XLINK lacks a normalization procedure to NFC which might be a necessary step for mapping non-ASCII characters to ASCII characters. [8] Sec. 5.1.2 (XML output method, encoding; http://www.w3.org/TR/xslt-xquery-serialization/#XML_ENCODING). "When outputting a newline character in the instance of the data model, the serializer is free to represent it using any character sequence that will be normalized to a newline character by an XML parser, unless a specific mapping for the newline character is provided in a character map: see 9 Character Maps." This should probably say that for interoperability, it is better to avoid x85 and x2028. See sec. 2.11 of XML 1.1 for further information. [9] Sec. 5.1.5 (XML output method, omit-xml-declaration; http://www.w3.org/TR/xslt-xquery-serialization/#XML_OMIT-XML-DECLARATION). The interplay between omit-xml-declaration and the standalone parameter might disallow producing xml documents which are in another encoding than UTF-8 or UTF-16 and has no XML declaration. Nevertheless this should be possible, e.g. if xml is served over HTTP with a corresponding charset parameter. Also, with XML 1.1, the xml declaration is mandatory, no matter what the values of omit-xml-declaration and standalone are. [10] Sec. 6.1.12 (http://www.w3.org/TR/xslt-xquery-serialization/#XHTML_ESCAPE-URI-ATTRIBUTES) and Sec. 7.4.12 (http://www.w3.org/TR/xslt-xquery-serialization/#HTML_ESCAPE-URI-ATTRIBUTES), Note starting: "This escaping is deliberately confined to non-ASCII characters ...". There are certain ASCII characters that are not allowed in URIs, namely namely "<", ">", '"', space, "{", "}", "|", "\", "^", and "`". They should be escaped. [11] Sec. 7.3 (HTML Output Method: Writing Character Data; http://www.w3.org/TR/xslt-xquery-serialization/#N10FE9). "When outputting a sequence of whitespace characters in the data model, within an element where whitespace is treated normally, (but not in elements such as pre and textarea) the html output method may represent it using any character sequence that will be treated as whitespace by an HTML user agent." We need to check whether this (which allows replacement of whitespace including linebreaks by whitespace not including linebreaks and vice-versa) is okay for Chinese, Japanese, Thai, ... (languages without spaces between words). This has to be checked extremely carefully. [12] Sec. 8.1.13 (http://www.w3.org/TR/xslt-xquery-serialization/#TEXT_INCLUDE-CONTENT-TYPE). The text should talk about "include-content-type" instead of "escape-uri-attributes". ------------------------------------------------------------ Name of specification: XML Syntax for XQuery 1.0 (XQueryX) Document: http://www.w3.org/TR/2005/WD-xqueryx-20050404/ Main reviewer: Felix Sasaki (fsasaki@w3.org) ------------------------------------------------------------ Comments: [1] Sec. 5 (http://www.w3.org/TR/2005/WD-xqueryx-20050404/#TrivialEmbedding), "If the XQuery contains characters that are prohibited in XML text (such as < and &), they must be "escaped" as either character entity references or character references." It should be made clear what is meant my "prohibited in XML text", e.g. XML-predefined entities. [2] C.2. (http://www.w3.org/TR/2005/WD-xqueryx-20050404/#xqueryx-mime-registration), concerning various subsections. Editorial: Please add RFC 3023 in the reference section (Appendix A). [3] C.2.1, encoding considerations, editoral. "The considerations as specified in RFC 3023 [XMLMIME] also hold for 'application/xquery+xml'." Please add a link to the section in RFC 3023 which deals with these considerations, i.e. sec. 3.2. ------------------------------------------------------------ Name of specification: XQuery 1.0: An XML Query Language Document: http://www.w3.org/TR/2005/WD-xquery-20050404/ Main reviewer: Felix Sasaki (fsasaki@w3.org) ------------------------------------------------------------ Comments: [1] General. How can xml:lang be extracted from data and preserved with a query? How can this be done without littering all elements with unnecessary xml:lang attributes? The function fn:lang, defined in the specification on functions and operators, provides some solution for the extraction of xml:lang, but not for its generation in the output. Something like the namespace-alias technique proposed by xslt 2.0 might be useful for this purpose, see http://www.w3.org/TR/2005/WD-xslt20-20050404/#namespace-aliasing [2] General. There should be more non-US examples. For example, it is very difficult for somebody not from the US to understand why there are no Deep Sea Fishermen in Nebraska. [3] 3.7.1.3 Content (http://www.w3.org/TR/xquery/#id-content): serializing atomic values by inserting spaces may not be appropriate for Chinese, Japanese, Thai,..., i.e. languages that don't use spaces between words. This has to be checked very carefully. [4] Sec. 3.7.2 (http://www.w3.org/TR/xquery/#id-otherConstructors). Not requiring CDATA constructs to be serialized as CDATA sections is a good idea, because it helps dispell the idea that CDATA sections are semantically significant. [5] For collations, namespaces, schemas, and so on, the production 141 "URILiteral" (sec. A.1 http://www.w3.org/TR/xquery/#id-grammar) is used, which refers to a "StringLiteral". "URILiteral" should be changed to "IRILiteral", and the reference section should contain an entry to the IRI specification RFC3987. There should also be a clear indication how XML Base affects collations, namespaces etc. [6] It is only implementation-defined, whether XQuery supports XML 1.0 or XML 1.1 (http://www.w3.org/TR/2005/WD-xquery-20050404/#dt-implementation-defined). There should be a feature in XQuery which allows to choose between these two versions of XML. [7] C.3 Serialization Parameters (http://www.w3.org/TR/xquery/#id-xq-serialization-parameters). This table must be updated with the respective table from the serialization specification (http://www.w3.org/TR/xslt-xquery-serialization/#serparam).
Received on Wednesday, 27 April 2005 12:14:18 UTC