F&O WD comments from Jeni Tennison on 2001-09-14 (www-xml-query-comments@w3.org from September 2001)

From: Jeni Tennison <jeni@jenitennison.com>
Date: Fri, 14 Sep 2001 14:12:18 +0100
To: Jim Melton <jim.melton@acm.org>
CC: www-xml-query-comments@w3.org
Message-ID: <1747530388.20010914141218@jenitennison.com>
Hi Jim,

I've been through the F&O WD with a fairly fine-toothed comb and come
up with the following comments/suggestions/questions. I hope that
they're useful.

2 Constructors, Functions and Operators on Numbers

2.3 Numeric Constructors. According to the Signatures, the numeric
constructors xf:long(), xf:int(), xf:short() and xf:byte() all return
integers rather than longs, ints, shorts and bytes. Is this correct or
is it a copy/paste error? This section mentions only some of the types
in the numeric type hierarchy. Why aren't there constructors for
nonPositiveInteger, negativeInteger, nonNegativeInteger,
positiveInteger, unsignedLong, unsignedInt, unsignedShort and
unsignedByte as there are for the other built-in derived types?

2.3.4.4 Examples (for xf:int()) - the second example states that the
value isn't valid for the *short* type, when it means the *int* type.

2.3.8.3 Semantics (for xf:double()) states that in XPath 1.0,
double('INF') returned NaN - there's no double() function in XPath 1.0
but it's a correct assertion for number('INF').

2.4 Operators on Numeric Functions. The first sentence of the
paragraph after the table of operators states that all operators
return one of the three types - float, double or decimal, yet most of
the rest of the section seems to treat integer as one of these
primitive numeric types (although the final example, byte*short
promotes both integer-derived values to decimal rather than integer).
Issue 105 (XPath 1.0 compatibility broken for div) would not be
problem if integers were always promoted to decimal rather than
integers.

2.5 Comparisons of Numeric Values. The beginning of the first
paragraph states that the comparison operators are defined for numeric
values, but the end of the paragraph talks about comparisons for
sequences, which is confusing.

3 Constructors, Functions and Operators on Strings.

3.3.3 xf:token. Carriage return (#xD) characters aren't mentioned, but
I believe that they should be? As per my previous mail regarding
whitespace within the strings for numeric constructors, I believe that
the strings for constructing tokens should be able to contain line
feeds, tabs and carriage returns without returning an error; the
string should have whitespace collapsed to form the token value. A
similar process should apply to normalizedString, I think (although
I'm confused by the XML Schema Datatypes Rec. on this, particularly
its distinction between line break vs carriage return & tabs).
Regarding Issue 46 (xf:token: Should other Unicode space characters be
considered?) I think the answer is no, because you're mirroring the
XML Schema datatypes Rec, which doesn't consider other Unicode space
characters.

In general, would it be a good idea to refer to the legal strings for
the construction of these datatypes as the lexical types defined
within XML Schema Datatypes, rather than repeating the definitions
from there within the F&O document (which might lead to maintenance
problems)?

3.3.10 xf:ENTITY - you (perhaps wisely) don't mention the requirement
for the strings for constructing ENTITY types to be the names of
unparsed entities declared in a document type definition. This may be
wise because the question that would arise is where the unparsed
entity must have been declared, which is difficult. In terms of a
stylesheet, for example, should it be an entity declared in the
stylesheet document or in the source document for the stylesheet? What
if you have several source documents? However, without adding this
requirement, there's a risk of constructing invalid ENTITY values.
Could you add a note or something to clarify?

3.4 Equality and Comparison of Strings contains a paragraph starting:
"The [Character Model for the World Wide Web 1.0] recommends that all
strings be normalized early and, thus, string comparisons need only be
defined on normalized strings." It's not clear here whether you're
talking about whitespace normalization or Unicode normalization. I'd
suggest that constructors/casts normalize according to the whiteSpace
facet of the relevant data type (i.e. early), and that no further
normalization is carried out automatically (for example, a user would
have to explicitly normalize strings to compare the
whitespace-collapsed versions of the strings).

3.5 Functions on String Values - xf:starts-with() comes from XPath
1.0. Also, the xf:normalize() function listed in the table doesn't
exist - I think it's a typo for xf:normalize-unicode(), which isn't
listed.

3.5.1 Usage Notes - the first paragraph states that the resulting
string must be normalized. Again, it's not clear whether it means
whitespace normalization or Unicode normalization. I'd argue that they
should not be whitespace normalized, because it doesn't make sense
for:

  xf:concat('   ', '   ') => ''

I'm not sure about Unicode normalization.

3.5.2 xf:concat. The concat() function in XPath 1.0 takes a minimum of
two arguments, whereas xf:concat() as described has a minimum of one
argument. For compatibility, xf:concat()'s signature should be
changed.

3.5.3 xf:starts-with, 3.5.4 xf:ends-with, 3.5.6 xf:contains. As far as
I understand it, collations are about the *ordering* of
characters/strings. I can see how the collation that you use has an
effect on the comparison between two strings, but how does it affect
whether one string contains another string? Either a string starts
with another string or it doesn't, it doesn't matter how you judge how
they sort relative to each other. The only thing I can think of is
that you're talking about the normalization used on the strings rather
than the collation used to order them, in which case I think that
normalization should be left out of the equation - if someone wants to
normalize strings, they can use the specific functions available to do
so.

3.5.5 xf:codepoint-contains, 3.5.9 xf:codepoint-substring-before,
3.5.11 xf:codepoint-substring-after. Each of these functions has a
non-codepoint version. I'd suggest just the basic xf:contains(),
xf:substring-before() and xf:substring-after() functions all work on a
character-by-character basis. If someone wants to call the functions
on normalized strings, they can use xf:normalize-unicode() to do so.

3.5.7 xf:substring. I think that the signature that you've specified
here is not compatible with XPath 1.0 because XPath 1.0 numbers are
equivalent to the double data type, not the decimal data type. Also,
getting an effective value by casting to unsignedInt restricts the
maximum value of the second and third arguments and thus the length of
the strings that you can deal with using xf:substring(). There's no
restriction on the length of strings in either XPath 1.0 or XML
Schema, so I think it's wise to cast to nonNegativeInteger rather than
unsignedInt. It seems strange that you have to pass a decimal number
to the function and yet the decimal will get converted to another
value type (which you wouldn't have been able to pass directly). I
think that this document should describe the functions in terms of the
argument types that they can actually *use* (i.e. the effective
values), and let XQuery/XPath decide what they will let the functions
*accept*. So for example here you might define xf:substring as:

  xf:substring(string $sourceString,
               positiveInteger $startingLoc,
               positiveInteger $length)

XQuery might state that other numeric arguments are converted to
positiveIntegers; XPath might go beyond that and say that all value
types are converted to positiveIntegers.

3.5.13 xf:normalize-space. The strings returned by this function are
valid xs:tokens - perhaps it should return a token rather than a
string?

3.5.14 xf:normalize-unicode. I don't think that there are any other
functions that accept a string value that has to be one of a certain
set of strings. To keep it in line with the syntax used for other
functions, perhaps there should be different functions
(xf:normalize-NFC(), xf:normalize-NFD(), xf:normalize-W3C()) or the
normalization form should be referred to using a QName or something,
so that people can define their own? Also, I'm no Unicode expert, but
I'm confused about why Normalization Forms KC and KD aren't included
in the set of normalizations allowed? Finally, could there be a
default for the normalization form that's used, such that the second
argument was optional?

3.5.15 xf:upper-case and 3.5.16 xf:lower-case. These functions will be
a real boon if defined and implemented properly (I also think that
xf:title-case would be useful, though it might be harder to define
because of the way different languages do capitalisation). It would be
good to revise the wording so that it's clear that a single character
might map to several characters when the case is converted. For
example, the lower-case 'ß' should map to the two upper-case
characters 'SS'. The length of the resulting string might be different
from the length of the initial string.

Talking about case - proper case-insensitive comparisons and
case-insensitive functions could be really useful. In particular for
xf:compare(), xf:contains(), xf:starts-with(), xf:ends-with(),
xf:substring-before(), and xf:substring-after(). Perhaps the treatment
of case is what you were aiming to gather from the collations in these
cases? I wonder whether it's worth having the concept of a
unicode-format (akin to decimal formats in XSLT) or something that
wraps together the concepts of normalization, collation and case
conversions?

3.5.18 xf:string-pad-beginning and 3.5.19 xf:string-pad-end. The
current design doesn't account for instances where the padding string
includes several different characters. It also doesn't include a means
of aligning a string centrally within a padding string. Take a look at
the EXSLT str:padding() and str:align() functions for a different
design (http://www.exslt.org/str/). As with xf:substring() I think
that the second argument should be cast to a nonNegativeInteger rather
than an unsignedInt. I also don't understand why the signature shows
the second argument being a decimal - there isn't the issue of
compatibility with XPath 1.0 here (though perhaps you were aiming for
consistency?)

3.5.21 xf:replace. This will be really useful as far as it goes, but
it doesn't address one of the common use cases in XSLT, which is when
people want to replace line break characters with <br /> elements. One
thing that we're experimenting with in EXSLT is to have str:replace()
mirror the translate() function, but with strings/nodes rather than
characters. So the second argument ($matchseq) is a sequence of
strings (or you could have regular expressions) and the third argument
($repseq) is a sequence of nodes or strings. All occurrences of string
number N in $matchseq are replaced by node or string N in $repseq. If
both approaches are useful perhaps there should be two functions.

4 Constructors, Functions and Operators on Booleans

4.1.3 xf:boolean-from-string. The valid lexical values for the
xs:boolean data type are 'true', 'false', '1' and '0'. 'True' and
'fAlse' are not valid xs:boolean values. I suggest changing
xf:boolean-from-string() so that (a) it only accepts lower-case 'true'
and 'false' and (b) it accepts '1' and '0' as well. This would mean
that a valid boolean value in an XML document could be used to
construct a boolean.

5 Constructors, Functions and Operators on Dates and Times

5.2.1 xf:duration. You might want to mention something about the
normalization of durations. For example, is xf:duration('PT2M') equal
to xf:duration('PT60S')?

5.2.5.4 Examples (xf:gYearMonth). Both of the examples should return
an error since neither are valid gYearMonth values. They should be
'2001-04' and '2001-04Z'.

5.2.7.4 Examples (xf:gMonthDay). Both of the examples should return an
error since neither are valid gMonthDay values. They should be
'--12-25' and '--12-25Z'.

5.2.8.4 Examples (xf:gMonth). Both of the examples should return an
error since neither are valid gMonth values. They should be '--10--'
and '--10--+02:00'.

5.2.9.4 Examples (xf:gDay). Both of the examples should return an
error since neither are valid gDay values. They should be '---13' and
'---14+02:30'.

5.2.10 xf:currentDateTime(). It's really good that this specifies the
same value is returned during the evaluation of a particular XQuery or
XPath - I'd like it to have the value fixed within an entire XSLT
transformation (perhaps generalize to 'execution context' or
something?). The other thing that isn't mentioned here is the timezone
that's used for the dateTime that's created. Should it be UTC or use
the timezone of the locale of the transformation?

5.3 Comparisons of Duration and Datetime Values. This is obviously
problematic because of the indeterminate ordering. Using functions
rather than operators would fix the problem, but be less user-friendly
than using operators. Another possibility would be that all the
comparisons return true if the comparison is indeterminate. If
necessary, you could find out whether it was actually indeterminate
with things like:

  $dateTime1 = $dateTime2 and $dateTime1 != $dateTime2

5.4 Component Extraction Functions. The names of these functions are a
bit misleading. For example, I would expect xf:get-gDay-from-dateTime
to return a xs:gDay value rather than an integer.
  
5.4.11.4 Examples (xf:get-gMonth-from-gMonthDay), 5.4.14.4 Examples
(xf:get-gDay-from-gMonthDay). A valid gMonthDay would be '--05-31'.

5.4.19 xf:get-seconds-from-dateTime. I think that seconds can only go
up to 60.9...9 (as in xf:get-seconds-from-time) not to 61.9...9?

5.4.22-28 xf:get-timezone-from-XXX. Accessing the timezone as a string
means that you then have to use substring functions to work out the
number of hours and minutes difference, which is less than ideal
(unless your only goal is to display the date/time). Perhaps it should
be returned as a duration? Or perhaps there should be separate
functions to access the timezone hours and minutes? If the timezone
*is* returned as a string, what is returned by date/times in the UTC
timezone ('Z'? '00:00'?) I also don't understand why they return an
empty sequence if there's no timezone, rather than an empty string,
given that the return types are a string?

5.6 Arithmetic Functions on Dates. I don't think that these functions
are useful given that you have a xf:get-end() function (or if you
allow + and - as suggested in Issue 96, which I also think is more
intuitive).

5.6.1 xf:add-days. The second argument is defined as a decimal number,
and yet the function returns a date. What happens if you call
xf:add-days(xf:date('2001-02-08'), 1.5)? Perhaps the first argument
and return value should be dateTime values, or the second argument
should be an integer?

5.6.2 xf:add-months and 5.6.3 xf:add-years. The same comment as for
xf:add-days applies here. Also, what about adding months to gMonthYear
values and adding years to gMonthYear and gYear values?

5.7 Functions on TimePeriods. There are several useful functions
missing which might be included here:

 - a means of normalizing dates and times to UTC dates and times
 - a means of adding/subtracting timezones to UTC dates and times
 - a means of normalizing durations
 - a means of parsing date/time strings in other formats (e.g.
   xf:parse-date('13-09-2001', 'dd-MM-yyyy') = xf:date('2001-09-13'))
 - a means of reformatting date/time strings in other formats (e.g.
   xf:format-date(xf:date('2001-09-13'), 'dd-MM-yyyy') => '13-09-2001')

8 Functions and Operators on base64Binary and hexBinary.

Why aren't there any constructor functions for these value types?

10 Functions and Operators on Nodes.

10.1 Operators on Nodes. Do == and !== work in the same way as = and
!= in terms of behaviour with sequences, or do they only look at the
first node in the sequence? For example, does {$node1, $node2} ==
{$node2, $node3} return true or false? Does {$node1, $node2} !=
{$node2, $node3} return true or false?

10.2.8 xf:copy and 10.2.9 xf:shallow. These functions seem to be about
*generating* a document rather than *querying* a document. I don't
think they should be included in this document.

11 Constructors, Functions and Operators on Sequences.

11.2.1 TO. Other operators use lowercase - why does TO use uppercase?
Why are the two operands effectively converted to unsignedInt values
rather than positiveInteger values (which would allow longer sequences
to be selected, and also prevent allowing 0 in the generated
sequence), and why define the operands as decimals if they get cast to
integers?

11.4.1 xf:position. This description will only be backwards compatible
with XPath 1.0 if both arguments are optional, the first defaulting to
the context node list and the second to the context node. There's a
missing ) in the example.

11.4.2 xf:last. This seems to be the same as count() in XPath 1.0 (and
xf:count() in this document) rather than last() in XPath 1.0.

11.4.3 xf:item-at. Won't predicates be allowed on sequences? If they
are, then xf:item-at() gives the same functionality as positional
predicates, e.g. xf:item-at({1, 2, 3}, 2) = {1, 2, 3}[2].

11.4.4 xf:index-of. This seems to incorporate the described
functionality of xf:position().

11.4.6 xf:exists. It seems that xf:exists($seq) gives exactly the same
result as not(xf:empty($seq))? One or the other is superfluous. Both
are superfluous if conversion of a sequence to a boolean tells you
whether it contains anything or not, as in XPath 1.0.

11.4.15 xf:sequence-pad-beginning and 11.4.16 xf:sequence-pad-end. As
with the similar functions on strings, it might be useful to split
these into functions that create padding and those that create
sequences of particular lengths using the padding.

11.4.17 xf:truncate-beginning and 11.4.18 xf:truncate-end. Aren't
these doable using predicates and the to operator? For example,
xf:truncate-end($seq, 3) is the same as $seq[1 to 3].

11.4.21 xf:unordered. I don't understand how this function works. It
would benefit from some examples.

11.5.1 xf:sequence-value-equal and 11.5.2 xf:sequence-node-equal. I
think these should either return boolean true or boolean false, not
empty sequences.

11.6 Aggregate Functions. Removing node duplicates for these functions
makes sense for compatibility with XPath 1.0 (although perhaps that
should be arranged during the construction of the node sequences
rather than when the node sequence is actually used). However, in
sequences of data then it doesn't make sense. For example, I might
have:

  <essays marks="55 72 67 62 55" />

and want to get the average of the marks achieved on the essays. If
you remove duplicates, the average will be higher than it actually
should be (in this case).

11.6.3 xf:max and 11.6.4 xf:min. What happens if the sequence contains
dates or times whose order is indeterminate? What do these functions
return with node sequences? A node or a value of a node?

11.7.1 xf:id. I'm not sure about the compatibility between this
function and id() in XPath 1.0, which can tokenises strings that it
takes as an argument to create sequences of IDs.

11.7.3 xf:filter. Like xf:copy() and xf:shallow(), this seems to be
about *constructing* rather than *querying* a document.

11.7.4 xf:document. This description isn't compatible with the XSLT
document() function, which can take two sequences as arguments.

12 Casting Functions.

The description in the first paragraph implies that the strings used
in the constructor functions can only be declared as literals - is
that correct? If so, they are unlike all other functions. This should
be indicated somehow in their description, and the data model should
make a distinction between a string included literally in an
expression and a string that's returned by a function/operator. I
don't think that constructors should be constrained in this way
(although that means that there's not much point to them).

It would be useful to include within this section a description of how
and whether sequences can be cast to the various data types. In XPath
1.0, for example, a node set is cast to a string by taking the string
value of the first node in the node set. Will something similar apply
here? The current tables suggest that sequences (e.g. IDREFS,
ENTITIES) can only be cast into an atomic data type if they hold a
single value.

Perhaps the descriptions of casting would be simplified if it was
described in general terms: first casting to the target type's parent
type, and then converting from that to the target type. You could
describe what is done to the values on conversion in terms of the
different facets (e.g. it's an error if it doesn't match the pattern;
round the decimal number if it contains too many fractionDigits). This
would cover casting between user-defined types as well as the built-in
types. The only casts that you would then have to describe explicitly
are the casts between the various primitive types. I'm not sure
whether this scheme would work, but a general guideline with lots of
exceptions is easier to understand than lots of specific definitions.

12.1 Casting to string and its derived types.

 * In casting durations to strings, is the duration normalized at
   all? I know that no canonical representation is defined in XML
   Schema, but you could define one that only included components for
   months or for seconds, or for both. If a component is 0, is that
   included in the duration string or not?

 * It's not clear to me how the two binary formats are cast
   to strings. Are the bytes interpreted as characters, or is the
   string the encoded version of the byte?

 * When casting to normalizedStrings, the WD states that line feeds,
   carriage returns, tabs and spaces are *removed*. This is a peculiar
   form of normalization - it would make more sense to have them
   replaced by spaces (i.e. whiteSpace replace). (Issue 84 applies to
   normalizedString as well as token.)

 * When casting to language, I think the SV should be converted first
   to a *token* rather than a *NMTOKEN*, since token is the parent
   type of language, not NMTOKEN. The table shows that Name->language
   casting as Y - I think it should be M because not all Names are
   languages.

 * When casting to Name, then I think that if ST is any of the
   derived types of Name then the conversion is also complete (i.e.
   NCName, ID, IDREF, ENTITY). The SV should be converted first to a
   *token* rather than a *string*, as this is the parent type of Name.
   The table suggests that a NMTOKEN source can always (Y) be cast to
   a Name, which is not correct (e.g. '123' is a valid NMTOKEN but not
   a valid Name). It should be labelled 'M'. I think that the lexical
   representation of durations are always valid Names, so perhaps that
   should be 'Y'. The strings 'true' and 'false' are both valid Names,
   so I think casting from boolean should be labelled 'Y'. anyURI
   values might be valid Names (if they're relative) so it should be
   labelled 'M', I think. QNames are always valid Names (so should be
   marked as 'Y'). Notations are always valid Names (so should be
   marked as 'Y'). 

 * When casting to NCName, then again I'd say that any subtype of
   NCName could be converted without change (i.e. include ENTITY). The
   SV should be converted to a *Name* (as NCName's supertype) rather
   than *string*. In terms of the table, I think that the following
   entries need changing: NMTOKEN should be 'M'; duration should be
   'Y'; boolean should be 'Y'; anyURI should be 'M'; QName should be
   'M'; NOTATION should be 'M'.

 * When casting to NMTOKEN, then the SV should be converted to a
   *token* rather than a *string* for the intermediate value. I think
   that all of the numeric data types could be converted to NMTOKENs
   without any problems, so should be labelled 'Y' in the table (e.g.
   '12.5E-2' is a valid NMTOKEN). Durations can also be labelled 'Y'.
   The date and time formats should be labelled 'M' as date/times that
   include a positive timezone would not be valid NMTOKENs (e.g.
   '2001-09-15-05:00' is a valid NMTOKEN, '2001-09-15+05:00' is not).
   boolean should be 'Y'; anyURI should be 'M'; QName should be 'Y';
   NOTATION should be 'Y'.

 * When casting to NMTOKENS, then the SV should be converted to a
   *token* rather than a *string* for the intermediate value (so that
   whitespace is dealt with properly). The same changes apply as for
   NMTOKEN - numeric types should be labelled 'Y'; duration 'Y';
   date/times 'M'; boolean 'Y'; anyURI 'M', QName 'Y' and NOTATION
   'Y'.

 * It's unclear why most casts to ID, IDREF, IDREFS, ENTITY and
   ENTITIES are disallowed? Is this because they involve extra
   semantics, such as being unique, having a ID to link to and being
   declared? If so, why aren't there similar problems with the
   constructor functions? I would have thought they could be treated
   as NCNames for casting purposes.

12.2 Casting to numeric types.

 * It should be possible to cast some NMTOKEN values to numbers, so
   all conversions from NMTOKEN and NMTOKENS should be marked as 'M'
   rather than 'N'. Also, it will often be possible to cast a gYear
   into a number (unless it includes a timezone or is too large for
   the particular target type), so that row should hold 'M's instead.

 * The fact that the base64Binary and hexBinary datatypes are
   labelled as 'N' and 'M' respectively implies that their lexical
   representations are treated as strings and then converted, rather
   than using values of the bytes. It might be more useful to use the
   numeric values, e.g. cast as integer(xf:hexBinary('FF')) => 255.
   Also it seems strange that cast as integer(xf:hexBinary('10'))
   gives the value 10 rather than the value 16. Users can always cast
   the binary values as strings if they want to get the lexical
   representation, and then convert that into an integer. However,
   both sets of casts should be labelled 'M' due to the large numbers
   that could be involved.

 * It might be helpful to have a function that converts a number of
   seconds into a dateTime and vice versa. I don't know whether that's
   something appropriate for casting or a dedicated function.

 * There should be a special mention about conversion from boolean
   values to numeric values as cast as string(xf:true()) gives 'true',
   which isn't a valid number. Instead, boolean true should evaluate
   to 1 and boolean false to 0.
   
 * When converting to float and double, is there any reason why the
   other numeric types are cast as strings first? I think that the SV
   should be cast to an intermediate NMTOKEN value - if it can't be
   cast to NMTOKEN then it can't be a valid float. I don't think that
   the string's case should be changed - I would say that only 'INF',
   '-INF' and 'NaN' should be recognised case sensitively, because XML
   in general is case sensitive.

 * When converting to decimal from descendant numeric types, there
   doesn't seem to be any point going through strings first since all
   integers, nonPositiveIntegers etc. are valid decimal numbers
   already. The conversion from floats and doubles is problematic.
   First, why round the number? The decimal type can take decimal
   numbers. Second, casting to strings first means that the
   float/double is converted to the canonical representation of the
   float/double, which specifies only one digit to the left of the
   decimal point, so cast as string(xf:float('12.5')) will give
   '1.25E1', which give an error because it's not a valid lexical
   representation for decimal. It would be better to define the cast
   in numerical terms, perhaps with +INF casting to the largest
   decimal number supported by the implementation and -INF to the
   smallest; positive and negative 0 should cast to 0; casting NaN and
   any numbers that are outside the range of decimal numbers supported
   by the implementation should return an error.

 * I think that the conversions to the integer type could be more
   succinctly described in terms of conversion to decimal followed by
   rounding. Given that there's rounding, I think that the decimal to
   integer conversion should be labelled 'Y' rather than 'M' - the
   range of permitted values is just the same as it originally was.

12.3 Casting to datetime and duration types.

 * It's feasible for tokens, NMTOKENs and NMTOKENSs to be valid
   durations or datetimes, so those rows in the table should be marked
   as 'M'. Names, NCNames, IDs, IDREFs, IDREFSs, ENTITYs, ENTITIESs
   and QNames could be valid durations, so those cells should be
   marked 'M'. Floats, doubles and decimals could be valid gYears, so
   those should be marked 'M'; integers and the other numeric types
   should probably be marked 'Y', though some of them may be 'M' if
   ISO 8601 puts a limit on the size of the year that it can
   represent.

 * The descriptions of the conversions between date/time formats
   refers to functions that aren't described elsewhere in the WD, such
   as xf:get-Year() - presumably these should be
   xf:get-gYear-from-dateTime() and so on. These descriptions should
   be brought into line with the distinction made between constructors
   and casting functions earlier in the section - I thought that
   constructors could only take literal strings as arguments, yet here
   they're illustrated as taking strings constructed with xf:concat()
   etc. Perhaps the constructors should be defined in terms of the
   casting operation rather than vice versa.

 * There are typos in the last list items under the description of
   the casts to gYearMonth, gYear, gMonthDay, gDay and gMonth in that
   they all say that strings are converted via xf:date() when they
   should use xf:gYearMonth(), xf:gYear(), xf:gMonthDay() etc.

 * The description of the construction of gMonthDay values would
   construct invalid values -- they should be in the format '--MM-DD',
   not 'MM-DD'. Similarly, gDay values should be in the format '---DD'
   rather than just 'DD' and gMonth values should be in the format
   '--MM--' rather than just 'MM'.

 * There's no mention in this section about how timezones are dealt
   with. I imagine that they should be included only if the source
   date/time has one.

12.4 Casting to all other simple types.

 * I don't think that there should be any case conversion involved
   in casting to boolean values - the only valid values should be
   'true', 'false', '1' and '0'. The description of casting to
   booleans makes special mention of the situation where 'ST is
   nonPositiveInteger or negativeInteger and SV is 1'. 1 isn't a valid
   value for either of those data types, so you cannot get a
   nonPositiveInteger/negativeInteger with the value 1. The bullet
   point should be deleted, in my opinion, and similarly for the
   following bullet point. I can't see any reason why token, Name,
   NCName, ID, IDREF, IDREFS, ENTITY, ENTITIES, NMTOKEN and NMTOKENS
   values shouldn't be converted to boolean values? I think all string
   values should be converted to tokens first, to get rid of any
   superfluous whitespace, so that cast as boolean(' true ') is true.

 * Might it be useful to be able to convert to binary representations
   of strings and numbers, for example to convert between decimal and
   hex, or to encode strings. I'm not sure whether casting is the
   appropriate place to do that, but it ought to be spelt out within
   this section that that *isn't* what happens.

 * I can't see any reason why token, Name, NCName, ID, IDREF,
   IDREFS, ENTITY, ENTITIES, NMTOKEN and NMTOKENS values shouldn't be
   convertable into URIs or QNames, if they follow the correct syntax?
   Again, I think they should be whitespace normalized through
   conversion to tokens.

Cheers,

Jeni

---
Jeni Tennison
http://www.jenitennison.com/
Received on Friday, 14 September 2001 09:13:24 UTC