Re: Jim Melton: XML Query WG review of RIF Datatypes and Built-Ins 1.0 from Chris Welty on 2009-10-02 (public-rif-comments@w3.org from October 2009)

From: Chris Welty <cawelty@gmail.com>
Date: Thu, 1 Oct 2009 20:10:08 -0400
To: Jim Melton <jim.melton@oracle.com>
Cc: Sandro Hawke <sandro@w3.org>, Jim Melton <jim.melton@oracle.com>, "public-rif-comments@w3.org" <public-rif-comments@w3.org>, "w3c-xsl-query@w3.org" <w3c-xsl-query@w3.org>
Message-Id: <D53EECD0-E43C-4F93-82AE-BD60ED6D6965@gmail.com>
Jim,

Regarding the naming, does this help:

is-literal-not-string (x) is defined to be
isLiteral(x) && !isString(x)

Honestly I can't think of a more informative name for it.

-Chris (sent from my iPhone)

On Oct 1, 2009, at 7:16 PM, Jim Melton <jim.melton@oracle.com> wrote:

> Sandro,
>
> I think we're about to reach closure ;^)
>
> At 9/30/2009 05:23 PM, Sandro Hawke wrote:
>> > >
>> > > > 6) In section 4.3, we learn that "Itruth Iexternal( ?arg1;
>> > > > pred:is-literal-not-DATATYPE ( ?arg1 ) )(s1) = t if and only  
>> if s1 is in
>> > > > the value space of one of the datatypes in
>> > > > http://www.w3.org/TR/rif-dtb/#sec-data-types>DTS but not in the
>> > > > value space of the datatype with shortname DATATYPE, and f
>> > > > otherwise."  We believe that means that the predicate
>> > > > pred:is-literal-not-integer returns f if the value of its  
>> argument is not
>> > > > in the value space of any datatype in DTS!  If that is true,  
>> then it
>> > > > is highly misleading, because returning false implies that  
>> the value is a
>> > > > literal of type integer.  We recommend that you reconsider this
>> > > > definition so that the predicate returns true when the value  
>> is either
>> > > > (a)not in the value space of any datatype in DTS or (b)is in  
>> the value
>> > > > space of some data type in DTS but not in the value space of  
>> the
>> > > > specified datatype.
>> > >
>> > >We believe the definition as given is correct, but that the  
>> intended
>> > >meaning of negative guards was not clear.  We have added this  
>> note to
>> > >the end of section 4.3:
>> > >
>> > >"Note: The semantics of negative guards may be surprising. The
>> > >is-literal-not-String guard essentially asks, "Is this a  
>> literal, and
>> > >(if it is) is it something other than a String?" It could also  
>> be read
>> > >as "Is this a decimal or a float or a double or a date or a  
>> dateTime,
>> > >etc, [for every datatype except string] ?". The negative guards  
>> are
>> > >formulated like this to allow for rules which detect, for  
>> instance,
>> > >some kinds of bad inputs, while still using the open world  
>> assumption
>> > >of some RIF dialects."
>> > >
>> > >Hopefully, that's detailed enough to show that the definition is
>> > >correct.  A more-detailed explanation of why we can't provide
>> > >is-not-String seems out-of-scope for this document.
>> >
>> > Jim: Thanks for the explanation. I now understand why the predicate
>> > has the semantics that it does.  I must say, though, that I find  
>> the
>> > name itself unfortunate because of its counter-intuitiveness.  Full
>> > disclosure: I have long advised people to not depend on intuition  
>> or
>> > on Webster's Dictionary for the meaning of keywords and function  
>> name
>> > in a programming language, but to depend solely on the language
>> > spec.  This is obviously a case where I am not following my own
>> > advice.  But I also advise designers of languages to avoid
>> > consciously using counter-intuitive terms whenever possible.
>> >
>> > Jim: In spite of the conflicting tone of the preceding paragraph, I
>> > do not ask that you reconsider the name of the predicate, because
>> > there is great value in having consistency amongst the names used  
>> for
>> > similar purposes in a programming language and that consideration
>> > probably outweighs the counter-intuitiveness (which might not  
>> affect
>> > every reader anyway).
>>
>> Can you explain how the name seems counter-intuitive to you?  Given  
>> the
>> meaning (test that something is a literal and is not a string), it  
>> seems
>> to me that is-literal-not-String is pretty clear.  It could also be
>> is-literal-and-is-not-String, but that doesn't seem that much  
>> clearer.
>>
>> I guess I'm surprised that, knowing the meaning, you find the name  
>> to be
>> a problem.  That surprise makes me thing there's some aspect of the
>> situation I'm missing.
>
> Maybe I'm still misunderstanding the semantics.  Our original  
> comment said "We believe that means that the predicate pred:is- 
> literal-not-integer returns f if the value of its argument is not in  
> the value space of any datatype in DTS!"  If we substitute "Sting"  
> for "integer", then it gets even more confusing (to me, at least).   
> Consider a sequence/list/whatever of 16-bit values representing  
> Unicode characters encoded in UTF-16.  If one of those 16-bit  
> values, but not two consecutive such values, is a number  
> corresponding to the space reserved in UTF-16 for surrogate pairs,  
> then that value is not in the value space of any data type in DTS.   
> Another example that doesn't depend on UTF-16 is the representation  
> (in UTF-8, UTF-16, or UCS4) of the value "FFFF", which is defined by  
> Unicode to not be a character.  By definition, it's not an  
> xs:string, nor, I believe, a String.  (One might also ask if is it a  
> literal.  In XQuery, XSLT, and XML Schema, I believe that it is not  
> a literal, because it contains bits that do not represent anything  
> valid.  But let's suppose that it is, in the sense that it's  
> otherwise in the form of a literal value in the language.)
>
> It is, however, *clearly* not in the value space of *any* data type  
> in DTS.  Therefore, the definition of the is-literal-not-String  
> predicate is not satisfied; that definition requires that "if and  
> only if s1 is in the value space of one of the datatypes in http://www.w3.org/TR/rif-dtb/#sec-data-types 
> >DTS".  Since that bit/byte/whatever sequence is not a valid value  
> of any XML Schema data type, then it violates that prescription in  
> the definition.  As a result, your "if and only if" has been  
> violated, and thus the predicate should not return true.
>
> One solution would be to change the name of the predicate to "is-not- 
> String", but that is not really what you're trying to accomplish (I  
> think), besides which it violates the pattern chosen for the names  
> of predicate of this category.  The other solution would be to  
> clarify that, at least for the negative guards, the phrase "if s1 is  
> in the value space of one of the datatypes in DTS" from the  
> definition.  I think the latter solution is preferable (and  
> apologize for giving the impression in my previous response that it  
> was specifically the name of the predicate that I found confusing).
>
>
>> > >
>> > > > 11) Section 4.11.1. Is it wise to number positions in a list  
>> starting
>> > > > from zero, while numbering characters within a string (for  
>> example, in
>> > > > the substring() function) from 1?  We think this  
>> inconsistency will
>> > > > confuse your readers and users.
>> > >
>> > >We struggled with this some more today, but decided to leave  
>> indexing
>> > >as is.  It's a really infortunate situation, and we can't see  
>> any way
>> > >forward which wont confuse users.  Given that lists are  
>> substantially
>> > >different from xpath sequences, well, hopefully people will  
>> understand
>> > >and tolerate this approach.
>> >
>> > Jim: This is most unfortunate, and may be the only thing to which  
>> we
>> > might actually object.  Please note that our comment didn't use the
>> > example of sequences, because your language doesn't contain that
>> > concept; we used strings, which your language does have.  Yes,  
>> lists
>> > and character strings are not the same thing, but many application
>> > programmers will be familiar with treating strings as lists of
>> > characters.  Numbering lists of characters (that is, strings)
>> > starting with 1, but numbering other kinds of lists starting with 0
>> > is, in our opinion, likely to be a serious source of confusion and
>> > erroneous code.  We strongly urge you to further consider this
>> > decision.  Assuming that you will choose to publish the CR without
>> > making a change here, I must advise you that our WG might choose to
>> > make an additional comment on this point during your CR
>> > period.  (Personally, I sincerely doubt that any of our members  
>> would
>> > go so far as to raise a formal objection, so you need not fear  
>> that outcome.)
>> >
>> > Jim: If you do not make changes to resolve this concern, then  
>> please
>> > be sure that the spec clearly points out the different base value  
>> for
>> > the two kinds of positions.  That is probably best done with
>> > non-normative notes in two places -- where character strings are
>> > defined and where lists are defined, cross referencing one another.
>>
>> Perhaps a short-term solution is for us to mark the indexing (perhaps
>> for strings and lists?) as At Risk.  That lets us defer the decision
>> until the end of CR, and get feedback from implementors and the
>> community.  It would allow us to change this later, without a second
>> Last Call and second CR.
>
> I have no problem with this approach.  You should consider exactly  
> what is "At Risk", though.  Is it the functionality of indexing in  
> general, or the details of whether indexing is 0-based vs 1-based?   
> In my admittedly limited experience, it is entire functionalities  
> that one marks At Risk, not different approaches of specifying the  
> same functionality.  But I am not in any way opposed to your taking  
> this approach to this issue.
>
>
>> For myself, mostly programming Python these days, I find it odd  
>> that in
>> xpath and RIF I can't use the same functions on lists and strings
>> (treating strings as sequences of 1-character strings).
>> The fact that
>> concat is for strings and concatenate is for sequences/lists is
>> ... unfortunate.
>
> Having never programmed in Python, I can't identify as easily as  
> you, but I know other languages that behave much the same in this  
> respect.  I agree that the concat/concatenate dichotomy is  
> unfortunate.  However, in Functions & Operators, we do have the  
> fn:concat function, but do not have an *fn:* function by the name  
> fn:concatenate (yes, we have the op: definitional function of that  
> name, but that's not a problem in F&O directly, or in XQuery and  
> XSLT, since the op: functions are not available to application  
> writers).  But, because you've adopted both the fn: functions and  
> the op: functions, you do face that problem.  Apologies!
>
>> (I wonder if we could define our list builtins to also
>> work on strings, as if they were lists like that....  Ah well, maybe
>> it's too late for that.)
>
> Ummm...probably ;^)
>
> Hope this helps,
>   Jim
>
>
>> I'm hoping that a more elegant DTB 2.0 based on user experience  
>> wont be
>> too many years off.
>
> === 
> =====================================================================
> Jim Melton --- Editor of ISO/IEC 9075-* (SQL)     Phone: +1.801.942.0144
>  Chair, W3C XML Query WG; XQX (etc.) editor       Fax :  
> +1.801.942.3345
> Oracle Corporation        Oracle Email: jim dot melton at oracle dot  
> com
> 1930 Viscounti Drive      Standards email: jim dot melton at acm dot  
> org
> Sandy, UT 84093-1063 USA          Personal email: jim at melton dot  
> name
> === 
> =====================================================================
> =  Facts are facts.   But any opinions expressed are the  
> opinions      =
> =  only of myself and may or may not reflect the opinions of  
> anybody   =
> =  else with whom I may or may not have discussed the issues at  
> hand.  =
> === 
> =====================================================================
>
Received on Friday, 2 October 2009 00:11:03 UTC