Re: Jim Melton: XML Query WG review of RIF Datatypes and Built-Ins 1.0 from Jim Melton on 2009-10-01 (public-rif-comments@w3.org from October 2009)

From: Jim Melton <jim.melton@oracle.com>
Date: Thu, 01 Oct 2009 17:16:38 -0600
To: Sandro Hawke <sandro@w3.org>
Cc: Jim Melton <jim.melton@oracle.com>,public-rif-comments@w3.org, w3c-xsl-query@w3.org
Message-Id: <7.0.1.0.2.20091001165312.0bf62668@oracle.com>
Sandro,

I think we're about to reach closure ;^)

At 9/30/2009 05:23 PM, Sandro Hawke wrote:
> > >
> > > > 6) In section 4.3, we learn that "Itruth Iexternal( ?arg1;
> > > > pred:is-literal-not-DATATYPE ( ?arg1 ) )(s1) = t if and only 
> if s1 is in
> > > > the value space of one of the datatypes in
> > > > http://www.w3.org/TR/rif-dtb/#sec-data-types>DTS but not in the
> > > > value space of the datatype with shortname DATATYPE, and f
> > > > otherwise."  We believe that means that the predicate
> > > > pred:is-literal-not-integer returns f if the value of its 
> argument is not
> > > > in the value space of any datatype in DTS!  If that is true, then it
> > > > is highly misleading, because returning false implies that 
> the value is a
> > > > literal of type integer.  We recommend that you reconsider this
> > > > definition so that the predicate returns true when the value is either
> > > > (a)not in the value space of any datatype in DTS or (b)is in the value
> > > > space of some data type in DTS but not in the value space of the
> > > > specified datatype.
> > >
> > >We believe the definition as given is correct, but that the intended
> > >meaning of negative guards was not clear.  We have added this note to
> > >the end of section 4.3:
> > >
> > >"Note: The semantics of negative guards may be surprising. The
> > >is-literal-not-String guard essentially asks, "Is this a literal, and
> > >(if it is) is it something other than a String?" It could also be read
> > >as "Is this a decimal or a float or a double or a date or a dateTime,
> > >etc, [for every datatype except string] ?". The negative guards are
> > >formulated like this to allow for rules which detect, for instance,
> > >some kinds of bad inputs, while still using the open world assumption
> > >of some RIF dialects."
> > >
> > >Hopefully, that's detailed enough to show that the definition is
> > >correct.  A more-detailed explanation of why we can't provide
> > >is-not-String seems out-of-scope for this document.
> >
> > Jim: Thanks for the explanation. I now understand why the predicate
> > has the semantics that it does.  I must say, though, that I find the
> > name itself unfortunate because of its counter-intuitiveness.  Full
> > disclosure: I have long advised people to not depend on intuition or
> > on Webster's Dictionary for the meaning of keywords and function name
> > in a programming language, but to depend solely on the language
> > spec.  This is obviously a case where I am not following my own
> > advice.  But I also advise designers of languages to avoid
> > consciously using counter-intuitive terms whenever possible.
> >
> > Jim: In spite of the conflicting tone of the preceding paragraph, I
> > do not ask that you reconsider the name of the predicate, because
> > there is great value in having consistency amongst the names used for
> > similar purposes in a programming language and that consideration
> > probably outweighs the counter-intuitiveness (which might not affect
> > every reader anyway).
>
>Can you explain how the name seems counter-intuitive to you?  Given the
>meaning (test that something is a literal and is not a string), it seems
>to me that is-literal-not-String is pretty clear.  It could also be
>is-literal-and-is-not-String, but that doesn't seem that much clearer.
>
>I guess I'm surprised that, knowing the meaning, you find the name to be
>a problem.  That surprise makes me thing there's some aspect of the
>situation I'm missing.

Maybe I'm still misunderstanding the semantics.  Our original comment 
said "We believe that means that the predicate 
pred:is-literal-not-integer returns f if the value of its argument is 
not in the value space of any datatype in DTS!"  If we substitute 
"Sting" for "integer", then it gets even more confusing (to me, at 
least).  Consider a sequence/list/whatever of 16-bit values 
representing Unicode characters encoded in UTF-16.  If one of those 
16-bit values, but not two consecutive such values, is a number 
corresponding to the space reserved in UTF-16 for surrogate pairs, 
then that value is not in the value space of any data type in 
DTS.  Another example that doesn't depend on UTF-16 is the 
representation (in UTF-8, UTF-16, or UCS4) of the value "FFFF", which 
is defined by Unicode to not be a character.  By definition, it's not 
an xs:string, nor, I believe, a String.  (One might also ask if is it 
a literal.  In XQuery, XSLT, and XML Schema, I believe that it is not 
a literal, because it contains bits that do not represent anything 
valid.  But let's suppose that it is, in the sense that it's 
otherwise in the form of a literal value in the language.)

It is, however, *clearly* not in the value space of *any* data type 
in DTS.  Therefore, the definition of the is-literal-not-String 
predicate is not satisfied; that definition requires that "if and 
only if s1 is in the value space of one of the datatypes in 
http://www.w3.org/TR/rif-dtb/#sec-data-types>DTS".  Since that 
bit/byte/whatever sequence is not a valid value of any XML Schema 
data type, then it violates that prescription in the definition.  As 
a result, your "if and only if" has been violated, and thus the 
predicate should not return true.

One solution would be to change the name of the predicate to 
"is-not-String", but that is not really what you're trying to 
accomplish (I think), besides which it violates the pattern chosen 
for the names of predicate of this category.  The other solution 
would be to clarify that, at least for the negative guards, the 
phrase "if s1 is in the value space of one of the datatypes in DTS" 
from the definition.  I think the latter solution is preferable (and 
apologize for giving the impression in my previous response that it 
was specifically the name of the predicate that I found confusing).


> > >
> > > > 11) Section 4.11.1. Is it wise to number positions in a list starting
> > > > from zero, while numbering characters within a string (for example, in
> > > > the substring() function) from 1?  We think this inconsistency will
> > > > confuse your readers and users.
> > >
> > >We struggled with this some more today, but decided to leave indexing
> > >as is.  It's a really infortunate situation, and we can't see any way
> > >forward which wont confuse users.  Given that lists are substantially
> > >different from xpath sequences, well, hopefully people will understand
> > >and tolerate this approach.
> >
> > Jim: This is most unfortunate, and may be the only thing to which we
> > might actually object.  Please note that our comment didn't use the
> > example of sequences, because your language doesn't contain that
> > concept; we used strings, which your language does have.  Yes, lists
> > and character strings are not the same thing, but many application
> > programmers will be familiar with treating strings as lists of
> > characters.  Numbering lists of characters (that is, strings)
> > starting with 1, but numbering other kinds of lists starting with 0
> > is, in our opinion, likely to be a serious source of confusion and
> > erroneous code.  We strongly urge you to further consider this
> > decision.  Assuming that you will choose to publish the CR without
> > making a change here, I must advise you that our WG might choose to
> > make an additional comment on this point during your CR
> > period.  (Personally, I sincerely doubt that any of our members would
> > go so far as to raise a formal objection, so you need not fear 
> that outcome.)
> >
> > Jim: If you do not make changes to resolve this concern, then please
> > be sure that the spec clearly points out the different base value for
> > the two kinds of positions.  That is probably best done with
> > non-normative notes in two places -- where character strings are
> > defined and where lists are defined, cross referencing one another.
>
>Perhaps a short-term solution is for us to mark the indexing (perhaps
>for strings and lists?) as At Risk.  That lets us defer the decision
>until the end of CR, and get feedback from implementors and the
>community.  It would allow us to change this later, without a second
>Last Call and second CR.

I have no problem with this approach.  You should consider exactly 
what is "At Risk", though.  Is it the functionality of indexing in 
general, or the details of whether indexing is 0-based vs 
1-based?  In my admittedly limited experience, it is entire 
functionalities that one marks At Risk, not different approaches of 
specifying the same functionality.  But I am not in any way opposed 
to your taking this approach to this issue.


>For myself, mostly programming Python these days, I find it odd that in
>xpath and RIF I can't use the same functions on lists and strings
>(treating strings as sequences of 1-character strings).
>The fact that
>concat is for strings and concatenate is for sequences/lists is
>... unfortunate.

Having never programmed in Python, I can't identify as easily as you, 
but I know other languages that behave much the same in this 
respect.  I agree that the concat/concatenate dichotomy is 
unfortunate.  However, in Functions & Operators, we do have the 
fn:concat function, but do not have an *fn:* function by the name 
fn:concatenate (yes, we have the op: definitional function of that 
name, but that's not a problem in F&O directly, or in XQuery and 
XSLT, since the op: functions are not available to application 
writers).  But, because you've adopted both the fn: functions and the 
op: functions, you do face that problem.  Apologies!

>(I wonder if we could define our list builtins to also
>work on strings, as if they were lists like that....  Ah well, maybe
>it's too late for that.)

Ummm...probably ;^)

Hope this helps,
    Jim


>I'm hoping that a more elegant DTB 2.0 based on user experience wont be
>too many years off.

========================================================================
Jim Melton --- Editor of ISO/IEC 9075-* (SQL)     Phone: +1.801.942.0144
   Chair, W3C XML Query WG; XQX (etc.) editor       Fax : +1.801.942.3345
Oracle Corporation        Oracle Email: jim dot melton at oracle dot com
1930 Viscounti Drive      Standards email: jim dot melton at acm dot org
Sandy, UT 84093-1063 USA          Personal email: jim at melton dot name
========================================================================
=  Facts are facts.   But any opinions expressed are the opinions      =
=  only of myself and may or may not reflect the opinions of anybody   =
=  else with whom I may or may not have discussed the issues at hand.  =
========================================================================
Received on Thursday, 1 October 2009 23:17:35 UTC