Re: Jim Melton: XML Query WG review of RIF Datatypes and Built-Ins 1.0

> At 9/30/2009 05:23 PM, Sandro Hawke wrote:
> > > >
> > > > > 6) In section 4.3, we learn that "Itruth Iexternal( ?arg1;
> > > > > pred:is-literal-not-DATATYPE ( ?arg1 ) )(s1) = t if and only 
> > if s1 is in
> > > > > the value space of one of the datatypes in
> > > > > http://www.w3.org/TR/rif-dtb/#sec-data-types>DTS but not in the
> > > > > value space of the datatype with shortname DATATYPE, and f
> > > > > otherwise."  We believe that means that the predicate
> > > > > pred:is-literal-not-integer returns f if the value of its 
> > argument is not
> > > > > in the value space of any datatype in DTS!  If that is true, then it
> > > > > is highly misleading, because returning false implies that 
> > the value is a
> > > > literal of type integer.  We recommend that you reconsider this
> > > > > definition so that the predicate returns true when the value is eithe
> r
> > > > > (a)not in the value space of any datatype in DTS or (b)is in the valu
> e
> > > > > space of some data type in DTS but not in the value space of the
> > > > > specified datatype.
> > > >
> > > >We believe the definition as given is correct, but that the intended
> > > >meaning of negative guards was not clear.  We have added this note to
> > > >the end of section 4.3:
> > > >
> > > >"Note: The semantics of negative guards may be surprising. The
> > > >is-literal-not-String guard essentially asks, "Is this a literal, and
> > > >(if it is) is it something other than a String?" It could also be read
> > > >as "Is this a decimal or a float or a double or a date or a dateTime,
> > > >etc, [for every datatype except string] ?". The negative guards are
> > > >formulated like this to allow for rules which detect, for instance,
> > > >some kinds of bad inputs, while still using the open world assumption
> > > >of some RIF dialects."
> > > >
> > > >Hopefully, that's detailed enough to show that the definition is
> > > >correct.  A more-detailed explanation of why we can't provide
> > > >is-not-String seems out-of-scope for this document.
> > >
> > > Jim: Thanks for the explanation. I now understand why the predicate
> > > has the semantics that it does.  I must say, though, that I find the
> > > name itself unfortunate because of its counter-intuitiveness.  Full
> > > disclosure: I have long advised people to not depend on intuition or
> > > on Webster's Dictionary for the meaning of keywords and function name
> > > in a programming language, but to depend solely on the language
> > > spec.  This is obviously a case where I am not following my own
> > > advice.  But I also advise designers of languages to avoid
> > > consciously using counter-intuitive terms whenever possible.
> > >
> > > Jim: In spite of the conflicting tone of the preceding paragraph, I
> > > do not ask that you reconsider the name of the predicate, because
> > > there is great value in having consistency amongst the names used for
> > > similar purposes in a programming language and that consideration
> > > probably outweighs the counter-intuitiveness (which might not affect
> > > every reader anyway).
> >
> >Can you explain how the name seems counter-intuitive to you?  Given the
> >meaning (test that something is a literal and is not a string), it seems
> >to me that is-literal-not-String is pretty clear.  It could also be
> >is-literal-and-is-not-String, but that doesn't seem that much clearer.
> >
> >I guess I'm surprised that, knowing the meaning, you find the name to be
> >a problem.  That surprise makes me thing there's some aspect of the
> >situation I'm missing.
> 
> Maybe I'm still misunderstanding the semantics.  Our original comment 
> said "We believe that means that the predicate 
> pred:is-literal-not-integer returns f if the value of its argument is 
> not in the value space of any datatype in DTS!"  If we substitute 
> "Sting" for "integer", then it gets even more confusing (to me, at 
> least).  Consider a sequence/list/whatever of 16-bit values 
> representing Unicode characters encoded in UTF-16.  If one of those 
> 16-bit values, but not two consecutive such values, is a number 
> corresponding to the space reserved in UTF-16 for surrogate pairs, 
> then that value is not in the value space of any data type in 
> DTS.  Another example that doesn't depend on UTF-16 is the 
> representation (in UTF-8, UTF-16, or UCS4) of the value "FFFF", which 
> is defined by Unicode to not be a character.  By definition, it's not 
> an xs:string, nor, I believe, a String.  (One might also ask if is it 
> a literal.  In XQuery, XSLT, and XML Schema, I believe that it is not 
> a literal, because it contains bits that do not represent anything 
> valid.  But let's suppose that it is, in the sense that it's 
> otherwise in the form of a literal value in the language.)
> 
> It is, however, *clearly* not in the value space of *any* data type 
> in DTS.  Therefore, the definition of the is-literal-not-String 
> predicate is not satisfied; that definition requires that "if and 
> only if s1 is in the value space of one of the datatypes in 
> http://www.w3.org/TR/rif-dtb/#sec-data-types>DTS".  Since that 
> bit/byte/whatever sequence is not a valid value of any XML Schema 
> data type, then it violates that prescription in the definition.  As 
> a result, your "if and only if" has been violated, and thus the 
> predicate should not return true.
> 
> One solution would be to change the name of the predicate to 
> "is-not-String", but that is not really what you're trying to 
> accomplish (I think), besides which it violates the pattern chosen 
> for the names of predicate of this category.  The other solution 
> would be to clarify that, at least for the negative guards, the 
> phrase "if s1 is in the value space of one of the datatypes in DTS" 
> from the definition.  I think the latter solution is preferable (and 
> apologize for giving the impression in my previous response that it 
> was specifically the name of the predicate that I found confusing).

Consider:

    is-literal-not-int("hello"^^xs:int)

I don't think that's well-formed, so the semantics don't apply.

Similarly, in your FFFF example, you say "let's suppose that it is [a
literal]".  But I think the point is that it's not a literal.  As the
term "literal" is understood here, it means exactly all the things that
are in the value spaces of DTS.

So maybe the problem here is about the term 'literal'.  Michael
Sperberg-McQueen mentioned he thought that term was wrong.

Would better naming here be "is-datavalue-not-int"?  Or
"is-value-of-datatype-other-than-int"?

That still doesn't suggest that we're only talking about datavalues in
DTS, but perhaps it's better.

I expect people will use the function like this:

   ?x = "3"^^int
   ...
   if is-literal-not-int(?x) then ...

or 

   if is-literal-not-int(eg:some-function(?x)) then ...

and I can see how "datavalue" might be a more precise word here than
"literal".  The word "literal" is about the syntax.  At that level, in
the first case, the argument it's not a literal, it's a variable.  In
the second case, the argument is not a literal, it's a function-call.

I guess that's why I think "literal" is okay, because of course we're
not operating at that syntactic level.  Since it's nonsense to read it
(ahem) literally, one should understand it to mean "value-of-literal".

If there were no cost to change it, I could probably be convinced
"is-datavalue-not-int" or "is-value-of-datatype-not-int" would better
names; I'm not sure it's worth it at this point.  (In fact, I think we
settled on "literal" very quickly, because we were so tired of trying to
figure out the semantics on a function that isn't even that important.)

Anyway, have I at least understood your point?

     -- Sandro (obviously speaking for myself, and not consulting the group)

(I'm quite amused that our case of metonymy here is about the word
"literal".)

Received on Friday, 2 October 2009 02:13:12 UTC