Comments on December F&O draft from Ashok Malhotra on 2002-01-09 (www-xml-query-comments@w3.org from January 2002)

From: Ashok Malhotra <ashokma@microsoft.com>
Date: Wed, 9 Jan 2002 14:57:31 -0800
To: <Davidc@nag.co.uk>
Cc: <www-xml-query-comments@w3.org>, <xsl-list@lists.mulberrytech.com>, <w3c-xml-query-wg@w3.org>
Message-ID: <E5B814702B65CB4DA51644580E4853FB019EE73B@red-msg-12.redmond.corp.microsoft.com>
David:
Thank you for your comments on the F&O draft.  I've inserted responses
in the text of your note below.
This is a personal response.  It does not constitute an official
response from the XML Query WG and has not been approved by the WG. 

All the best, Ashok 
===========================================================




3.2 numeric constructors

  Just wanted to voice strong agreement with issue 149: these should
  not be restricted to string literals. At the underlying semantic level
  you need constructors but at the functions-exposed to user level
  this can be merged with functions casting from strings (or anything
  else).

[AM] Noted.  The thought was that constructors created typed values from
literals.  You used a cast to return a typed value from an expression.
This argument has been diluted by the fact that you cannot now cast
to/from derived types.

4.2.1 xf:string
I commented on this in the last draft, the text has changed but it is
still contradictory.

I can not "correctly perceive" it as a no-op if in the next paragraph it
is implied that it does W3C normalisation which is nothing at all like a
no-op.  Also the example still uses &# notation with text that implies
that there will always be an XML parser in the loop which
isn't the case for Xquery at present.

[AM] It's meant to be a no-op.  I think it's the example and the
following note that pertains to the example that are causing confusion.
We'll try and fix this.

4.2.2 xf:normalisedString
  Is there any use case for this? It seems to be rather a bizarre thing.
  The normalisation could be done by the user using translate() if
  desired. The restriction on not having #xD in the argument will be
  almost impossible to maintain in non XML uses of Xpath. XML normalises
  all line ends to #xA but in a non XML setting line ends may well be
  #xD or #xD#xA pairs, in which case normalising just #xA and declaring
  #xD an error will mean that an Xquery breaks just by moving the text
  file containing it from one place to another (unless  every host
  language for xpath does a similar line end normalisation)

[AM] The F&O provides constructors for all built-in XML Schema
datatypes.  normalizedString is a built-in Schema datatype (derived from
string).  Of course you could create a string and normalize it but then
it would be typed as string not normalizedString.

4.4 
xf:lower-case
  Is this collation dependent? I couldn't tell from the previous section
  4.3 what exactly a collation controlled. (ie how do I get that the
  lowercase of I is dotless i in Turkey?)

[AM] Experts tell me that collations cannot be used to do translations.
An example such as the above requires a dictionary that depends on the
country, language, etc.  It does require extra information but it's not
a collation.

xf:match
  This seems to be underspecified in cases that the matching regions
  overlap. if the regexp is aa and the string is aaa do you just get 
  (1) or (1 2) (this also applies to xf:replace)

  Slightly worried that, since xpath sequences do not nest, this
  semantic will prevent any future extension to allow sed/emacs/perl
  style numbered subexpressions. Also it forces the system always to
  match the entire string, which may be rather long, rather than
  stopping once a match is found.

  If instead it just returned the position of the first match a
  plausible extension would be that if the regexp was
  \(aa\)xx\(bb\)
  then what was returned was a sequence consisting of the position of
  the entire match followed by the positions of each of the
  subexpressions.
  a future extension to xf:replace could then use (something equivalent
  to &1 or $1 or \1 in current regexp languages) to access the
  matched subexpressions in the replacement text.

 [AM] Agree.  There are two issues that request amplification of the
semantics.

5.1.3
 If this only takes a string literal (as commented above I think all
 user accessible functions should not have this restriction) then why
 do a case mapping. if it has to be a literal you may as well demand
 "TRUE" rather than " true". (Also if it only takes literals it serves
 no purpose at the user level as it could always be replaced by true() 
 false() or an error.

[AM] I agree that if it takes a string literal its useless as true() and
false() cover the same ground.

5.2.1 op:boolean and
  the text says it backs up the "and" operator but I think that has to
  be backed up as an if clause, to get the correct semantics if one
  operand could raise an error.

[AM] The only error would be if one or both operands were not booleans.
This would be caught when the operands of the operator were type
checked.


5.3.2 xf:not3
  SQL can treat null specially in three valued logic as it knows that
any
  nulls are there for that purpose. Xpath should not assume any special
  semantics for an empty sequence. This might be an "unknown value"
  in which case a three valued logic might give reasonable results, or
  it might be a fixed default value, or anything else, depending on the
  document type. For a particular class of documents the user can define
  not3 if it makes sense, but functions assuming a particular
  interpretation of () should not be in the core of a general XML query
  language such as Xpath.

[AM] Noted.


6
I think that all new functions should match the existing xpath naming
convention, ie lowercase - separated words. When mapping names from
other languages that have other naming conventions (e.g. camel case)
then
some extra - may need to be added, and the names lowercased.
so I thing dateTime should be date-time throughout gMonthDay should be
g-month-day etc.

[AM] We tried to do this except where the name includes the name of a
XML Schema datatype which uses intercapitalization, hence xf:dateTime().

especially bad is captial c in get-Century but lowercase h in get-hour

[AM] That's a bug.  I'll fix it.

9
I think I read this as saying that eq compares the base64 encoded string
as it appears in the XML (including any white space that would be
ignored in the base64 decoding) a more interesting equality iis to
compare the base 64 encoded strings ignoring white space (which
effectivly compares the encoded data)

[AM] Noted.  Frankly, we are not sure how important functions on base64
and hexBinary are?  Would appreciate feedback.

11.1.4 xf:deep-equal
While many queries will need some version of deep equality, the exact
details depend very much on the job in hand (ignore comments? white
space? element names?) I think it would be better to remove this and
have the xquery and xslt drafts give examples of deep equality
definitions in their respective user-defined function syntax.

[AM] It was felt that a structural equality function was needed.  I
agree that we need arguments to indicate whether comments and PIs should
be ignored or not.  


11.1.7 xf:copy
I commented on this last time, but _please_ change the name of this
function, it is massively confusing given that in XSLT copy does a
shallow copy.

[AM] deep-copy?

Given the note that XSLT will not support this, it should not be in the
core at all and moved into a XQuery specific function library.

[AM] At this time there is but one function library.

same comments for xf:shallow.


11.2
xf:if-absent appears to be a workaround for the loss of the Xpath 1.0
semantic that one can test for empty node sets (sequences) just by
coercing to boolean. I very much regret the loss of this semantic.
If it could be restored then if-absent would be redundent.

if-empty is another example of core functions assuming too much about
the
way data is encoded in XML. testing for empty data means different
things to different people  and all of them are simply expressable with
existing Xpath constructs, there is no need for this function and it
should be removed.

In both cases having "if" functionality as a function has the bad effect
that the operand is always evaluated even in the false case, so a user
would be well advised not to use these functions and instead use an if
expression. 

12.2.11
why is this sublist and not subsequence?

[AM] Good point!


12.4
If the values of the nodes in the sequence are themselves list valued
do all the terms in the individual lists get aggregated, and  in the
case of avg how many terms is the average over?

[AM] We do not have nested sequences.  So, we start by creating a
one-level sequence.

The sum of an empty sequence should be 0 not ().

[AM] We follow the SQL definition.


12.5
do xf:id and idref only operate on the current document (I assume so,
but it isn't stated)

[AM] There are issues on the scope of these functions.

filter sounds like it may possibly be useful, but it's a bad name.

[AM] The name was generated by the XQuery syntax folks.

document has lost most of the functionality in the xslt 1.0 version,
which needs to be restored.

[AM] I agree.  There is an issue to that effect.

14
as commented above I believe that casting and constructors should be
merged at the user level (although of course they need to be distinct in
the formal semantics). Given a function that casts, there is no reason
to make available the constructor which has the same functionality but
is restricted to having literals as input. The constructor can
presumably be optimised but any optimising compiler ought be able to
spot a function call with a literal argument and do the same I'd have
thought (but I've never written an optimising compiler:-)
Received on Wednesday, 9 January 2002 17:58:04 UTC