- From: Martin Duerst <duerst@w3.org>
- Date: Mon, 07 Jul 2003 11:41:25 -0400
- To: public-qt-comments@w3.org
- Cc: w3c-i18n-ig@w3.org
Dear XML Query WG and XSL WG,
Below please find the I18N WGs comments on your last call document
"XQuery 1.0 and XPath 2.0 Data Model"
(http://www.w3.org/TR/2003/WD-xpath-datamodel-20030502/).
Please note the following:
- Please address all replies to there comments to the I18N IG mailing
list (w3c-i18n-ig@w3.org), not just to me.
- All i18n-relevant comments are marked with ***. There are also general
comments on the spec which we hope you will find useful.
- We have not yet reviewed the other documents, such as XQuery 1.0
or XSLT 2.0, and so we might be unaware of i18n issues that appear
in these specs but may have to be traced back to the data model.
There are also cases where we have identified an i18n issue here,
but we are not sure exactly what the best solution will be.
- Our comments are numbered in square brackets [nn].
We look forward to further discussion with you.
General:
[1] In general, this is a very extensive and rather boring document.
Where possible, it should be shortened and compacted, to make it
easier to get the relevant points.
[2] There are mappings between the following different things:
- properties of nodes of the data model
- the corresponding accessors
- the mapping from the PSVI (post-schema validation infoset)
to the properties
- the mapping from accessor output to XML infoset properties
(in the other draft we are reviewing, there are also functions
corresponding to the accessors).
At least one of these could easily be removed (e.g.
the properties or the accessors).
[3] 1. Intro: 'stylesheet or query' should be replaced by 'transform or query'
[4] 2. expanded-QName: Does this allow to handle special cases such as
XSLT that transforms XSLT, or XQuery that queries XQuery,...?
[5] 3.2 Document order: "The relative order of nodes in distinct documents
is implementation-dependent but stable. In other words, given two distinct
documents A and B, if a node in document A is before a node in document B,
then every node in document A is before every node in document B.
The second sentence sounds like a corollary from the first, but is
a non sequitur. It could as well be that an implementation decides
to order first all the first nodes from all the documents, then all
the second nodes, and so on. If indeed all nodes of one document
have to be before all nodes of another document, that should be
said explicitly, and not only as 'in other words'.
[6] 3.3, markup of [Definition]. Using square brackets for indicating
definitions doesn't look good at all. Also, there should not
be a period before and after the closing ].
[7] *** 3.3 data model support of values that are not supported
by the XML Infoset: What about pcdata with an associated
language information? What about document fragments with
associated inherited attributes in general? RDF is dealing
with such things, and it would be very good if they could be handled.
[8] *** The handling of inherited attributes in general is an important
issue for I18N (because of xml:lang) that wasn't dealt with at
all in XSLT 1.0. Apart from what may be needed in the data model,
support is also important on a higher level.
[9] 3.3 "The data model supports incompletely validated documents, but
inconsistent data models are forbidden."
What is an inconsistent data model? What actually happens when there
is such a model? Does an error get thrown?
[10] 3.4 "In either case, the type names must also appear in the In-scope
Schema Definitions (as defined in [XPath 2.0]) available to the processor."
'type name' or rather 'type definitions' or 'types'? There are
anonymous type names, but it seems strange to say that these appear
somewhere.
[11] *** anyType, anySimpleType, anyAtomicType, untypedAtomic, string, text
nodes:
This is a very general concern, but very important for
internationalization. There seems to be a proliferation of type
variants dealing with the simplest of things in XML, namely simple
text. This seems to ruin quite a bit of the benefits of using
Unicode; now that we have solved the character encoding problems,
we don't want to create arbitrary differences for simple pieces of
text. But various specs (e.g. also RDF) seem to come up with
additional ways of creating arbitrary differences.
anyAtomicType and untypedAtomic seem to be badly explained
and justified. We have to make sure that whenever possible, there
is no arbitary boundaries in functionality. Rather than treating
string, text nodes, and untyped as three completely different
things, they should work as much as possible in an overloaded
way similar to the number operators.
[12] *** 3.6.1 date and time mappings: for things with timezones,
canonicalizing
the time zone and then representing the original time zone separately
seems to make sense. But for values without a timezone, representing
them as if they were in UTC is inherently wrong and will lead to
a lot of misunderstandings. (having things with timezones and things
without timezones as separate types would have been the better
solution originally, and maybe it's still not too late for that)
[13] 3.6.1, editorial: "Lexical representations that do not have a timezone
are assumed to be in UTC for the purposes of normalization." ->
"Lexical representations that do not have a timezone are assumed to be
in UTC for the purposes of normalization ONLY."
[14] 4.1.6 typed-value: It would be good to have some explanation of what
the idea/purpose of this accessor is. It seems to be strange that
some cases produce errors. Why does mixed content produce a string,
but complex content, a subset of mixed content, produce an error?
[15] *** 4.1.8 children Accessor: "The sequence of children will never
contain adjacent text nodes." (see also 4.2.1)
It is good that text nodes are always merged. But this should
be stated as a property of the data model, not just mentioned
in an accessor description.
4.2.1 "The children must consist exclusively of element, processing
instruction, comment, and text nodes if it is not empty. Attribute,
namespace, and document nodes can never appear as children"
[16] - 'if it is not empty' seems irrelevant, obviously an empty document
won't contain any nodes of other types either.
[17] - There should be a period at the end
[18] 4.2.1, "Implementations that support DTD processing and access to the
unparsed entity accessors, use the unparsed-entities property to associate
information about an unordered collection of unparsed entities with a
document node."
spurious comma
[19] 4.2.2 typed-value: why does document return the string value, but
any of its elements could return an error?
[20] *** 4.2.2 and many other places: As far as we understand from previous
discussions, xs:string is often used instead of xs:anyURI for
convenience (to avoid additional casts). It is important in these
cases to clearly state that the values actually have to be anyURIs,
AND are treated according to anyURI syntax.
[21] *** 4.2.4: [character encoding scheme]: "The values of these
properties are implementation-defined but must be consistent with the rest
of the Infoset constructed."
What does 'consistent' mean here? There is a dependency between
non-ASCII element/attribute/... names and the encoding chosen.
But for a data model that produces an infoset that is (not yet)
intended for serialization, it almost seems that any specific
value would be inappropriate. On the other hand, when actually
being written out, at least for XSLT, the property is not
implementation-dependent, but determined by the <output>
element. So we suggest the following text:
"irrelevant during processing, determined by XQuery or XSLT
for output"
[22] *** 4.3.1: processing instructions and comments: Is there a way
to ignore these (if not in the data model, then in XQuery and XSLT?)
Because they are not part of the actual text, ignoring them
is often desirable. In that case, the text nodes should merge
automatically.
[23] 4.3.2 "If the element node's type is xs:anyType, the
dm:typed-value accessor returns the node's string value as
xs:anySimpleType. If the type is a complex type with complex content,
invoking dm:typed-value raises an error."
Doesn't anyType include complex types?
[24] 4.3.2: One additional accessor: Why is this accessor not listed in the
table?
[25] 4.3.3: Ale xml:base attributes treated as special attributes or like
namespace declarations?
[26] 4.4.1: "Attribute nodes encapsulate XML attributes": 'represent' may be
better than 'encapsulate'.
[27] 4.4.2: The details about typed-value are useless duplications. It would
be better to specify this very clearly in one single place, and
just point to it from other places.
[28] 4.4.3: "The xs:QName IS computed..."
[29] 4.4.4: [owner element] -> [parent]
[30] ***4.5.1: uri -> anyURI (or an equivalent explanation)
[31] ***4.8.3: "The string-value is not W3C normalized as described in the
Character Model for the World Wide Web version 1.0 draft."
This may be misunderstood that the string value has to be
non-normalized. It should at least be clarified as follows:
"The string-value is not necessarily W3C normalized as described
in the Character Model for the World Wide Web version 1.0 draft.
It is the responsibility of data providers to provide appropriately
normalized text, and the responsibility of programmers to make
sure that operations do not de-normalize text."
Even better clarification, in particular of the first sentence,
is highly desirable, to clearly say that this refers to a state,
and not an action.
[32] 5. "The values of nodes whose type is derived by union from an XML
Schema primitive type are represented by a sequence of atomic values each
of whose type is one of the individual types from the union. The union type
information is lost and only the specific types of each individual item is
retained."
this seems to apply to lists of unions, or maybe unions of lists,
but not to simple unions. This should be clarified.
[33] 5. "Using the canonical lexical representation for atomic values may
not always be compatible with XPath 1.0.": Please say when this is not the
case.
D. Example:
[34] *** xml:lang should be used in the instance, not only appear
in the schema (and in the schema be allowed higher-up so
that it can be inherited)
[35] *** Defining a default currency in the schema is bad design
practice. Without the schema, the data is basically useless.
Please choose something different for an example of default attribute
handling.
[36] *** The monetaryAmount type works well for some currencies
(USD, EUR,...), but does not work for others (Yen,...).
Please generalize. The number of fractional digits needed
currently is 0, 2, or 3.
for details, please see:
http://www.bsi-global.com/Technical+Information/Publications/_Publications/t
ig90x.doc
[37] *** The pop-culture example may make it difficult for non-native
readers to understand the example, or to create a reasonable
translation.
[38] - "Literal strings are shown without the xs:string() constructor"
this should say that strings are shown in quotes
[39] - Why are N1-N5 before P1 and E1?
[40] - A4: why is typed-value xs:token?
[41] - typed-value of E5: inconsistent.
[42] - other inconsistencies include: children(E5)->T2,
string-value(A7), (A8), (A9), (A10), (A11) (string and typed
values seem out of sync)
[43] - Graphic representation of the data model. [large view]: This
should be provided in SVG
Regards, Martin.
Received on Monday, 7 July 2003 11:42:55 UTC