comments on December F&O draft from David Carlisle on 2002-01-04 (www-xml-query-comments@w3.org from January 2002)

From: David Carlisle <davidc@nag.co.uk>
Date: Fri, 4 Jan 2002 19:02:51 GMT
To: www-xml-query-comments@w3.org
CC: xsl-list@lists.mulberrytech.com
Message-Id: <200201041902.TAA22481@penguin.nag.co.uk>
3.2 numeric constructors

  Just wanted to voice strong agreement with issue 149: these should
  not be restricted to string literals. At the underlying semantic level
  you need constructors but at the functions-exposed to user level
  this can be merged with functions casting from strings (or anything
  else).

4.2.1 xf:string
I commented on this in the last draft, the text has changed but it is
still contradictory.

I can not "correctly perceive" it as a no-op if in the next paragraph it
is implied that it does W3C normalisation which is nothing at all like a
no-op.  Also the example still uses &# notation with text that implies
that there will always be an XML parser in the loop which
isn't the case for Xquery at present.

4.2.2 xf:normalisedString
  Is there any use case for this? It seems to be rather a bizarre thing.
  The normalisation could be done by the user using translate() if
  desired. The restriction on not having #xD in the argument will be
  almost impossible to maintain in non XML uses of Xpath. XML normalises
  all line ends to #xA but in a non XML setting line ends may well be
  #xD or #xD#xA pairs, in which case normalising just #xA and declaring
  #xD an error will mean that an Xquery breaks just by moving the text
  file containing it from one place to another (unless  every host
  language for xpath does a similar line end normalisation)


4.4 
xf:lower-case
  Is this collation dependent? I couldn't tell from the previous section
  4.3 what exactly a collation controlled. (ie how do I get that the
  lowercase of I is dotless i in Turkey?)

xf:match
  This seems to be underspecified in cases that the matching regions
  overlap. if the regexp is aa and the string is aaa do you just get 
  (1) or (1 2) (this also applies to xf:replace)

  Slightly worried that, since xpath sequences do not nest, this
  semantic will prevent any future extension to allow sed/emacs/perl
  style numbered subexpressions. Also it forces the system always to
  match the entire string, which may be rather long, rather than
  stopping once a match is found.

  If instead it just returned the position of the first match a
  plausible extension would be that if the regexp was
  \(aa\)xx\(bb\)
  then what was returned was a sequence consisting of the position of
  the entire match follwed by the positions of each of the
  subexpressions.
  a future extension to xf:replace could then use (something equivalent
  to &1 or $1 or \1 in current regexp languages) to access the
  matched subexpressions in the replacement text.

 

5.1.3
 If this only takes a string literal (as commented above I think all
 user accessible functions should not have this restriction) then why
 do a case mapping. if it has to be a literal you may as well demand
 "TRUE" rather than " true". (Also if it only takes literals it serves
 no purpose at the user level as it could always be replaced by true() 
 false() or an error.

5.2.1 op:boolean and
  the text says it backs up the "and" operator but I think that has to
  be backed up as an if clause, to get the correct semantics if one
  operand could raise an error.


5.3.2 xf:not3
  SQL can treat null specially in three valued logic as it knows that any
  nulls are there for that purpose. Xpath should not assume any special
  semantics for an empty sequence. This might be an "unknown value"
  in which case a three valued logic might give reasonable results, or
  it might be a fixed default value, or anything else, depending on the
  document type. For a particular class of documents the user can define
  not3 if it makes sense, but functions assuming a particular
  interpretation of () should not be in the core of a general XML query
  language such as Xpath.


6
I think that all new functions should match the existing xpath naming
convention, ie lowercase - separated words. When mapping names from
other languages that have other naming conventions (eg camel case) then
some extra - may need to be added, and the names lowercased.
so I thing dateTime should be date-time throughout gMonthDay should be
g-month-day etc.

especially bad is captial c in get-Century but lowercase h in get-hour

9
I think I read this as saying that eq compares the base64 encoded string
as it appears in the XML (including any white space that would be
ignored in the base64 decoding) a more interesting equality iis to
compare the base 64 encoded strings ignoring white space (which
effectivly compares the encoded data)


11.1.4 xf:deep-equal
While many queries will need some version of deep equality, the exact
details depend very much on the job in hand (ignore comments? white
space? element names?) I think it would be better to remove this and
have the xquery and xslt drafts give examples of deep equality
definitions in their respective user-defined function syntax.


11.1.7 xf:copy
I commented on this last time, but _please_ change the name of this
function, it is massively confusing given that in XSLT copy does a
shallow copy.

Given the note that XSLT will not support this, it should not be in the
core at all and moved into a XQuery specific function library.

same comments for xf:shallow.


11.2
xf:if-absent appears to be a workaround for the loss of the Xpath 1.0
semantic that one can test for empty node sets (sequences) just by
coercing to boolean. I very much regret the loss of this semantic.
If it could be restored then if-absent would be redundent.

if-empty is another example of core functions assuming too much about the
way data is encoded in XML. testing for empty data means different
things to different people  and all of them are simply expressable with
existing Xpath constructs, there is no need for this function and it
should be removed.

In both cases having "if" functionality as a function has the bad effect
that the operand is always evaluated even in the false case, so a user
would be well advised not to use these functions and instead use an if
expression. 

12.2.11
why is this sublist and not subsequence?



12.4
If the values of the nodes in the sequence are themselves list valued
do all the terms in the individual lists get aggregated, and  in the
case of avg how many terms is the average over?

The sum of an empty sequence should be 0 not ().


12.5
do xf:id and idref only operate on the current document (I assume so,
but it isn't stated)

filter sounds like it may possibly be useful, but it's a bad name.

document has lost most of the functionality in the xslt 1.0 version,
which needs to be restored.

14
as commented above I believe that casting and constructors should be
merged at the user level (although of course they need to be distinct in
the formal semantics). Given a function that casts, there is no reason
to make available the constructor which has the same functionality but
is restricted to having literals as input. The constructor can
presumably be optimised but any optimising compiler ought be able to
spot a function call with a literal argument and do the same I'd have
thought (but I've never written an optimising compiler:-)



David

_____________________________________________________________________
This message has been checked for all known viruses by Star Internet
delivered through the MessageLabs Virus Scanning Service. For further
information visit http://www.star.net.uk/stats.asp or alternatively call
Star Internet for details on the Virus Scanning Service.
Received on Friday, 4 January 2002 14:03:20 UTC