Re: Preliminary material for section "value testing" from Seaborne, Andy on 2004-10-13 (public-rdf-dawg@w3.org from October to December 2004)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Wed, 13 Oct 2004 18:49:14 +0100
To: Steve Harris <S.W.Harris@ecs.soton.ac.uk>
CC: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-ID: <416D6A9A.50505@hp.com>
Steve Harris wrote:
> On Tue, Oct 12, 2004 at 01:58:47 +0100, Andy Seaborne wrote:
> 
>>I have put some very preliminary text about value testing in:
>>
>>   http://jena.hpl.hp.com/~afs/value-testing.html
> 
> 
> Overall seems good, but I'm concerned about the number of functions and 
> the complexity of some of them.

My experience (I macro-generate operations) is that the number of operations is 
not the big issue.  What is harder is getting exact datatype evaluation.  That 
is, it's the engine, not functions, where most of the work goes.

> 
> Agree about the oddity of having datatype pricessing in one place and not
> the other being potentially confusing, but they are different,

They *may* be different - that is the decision we have to make!

On one side (user/application centric), Why should pattern matching of XML 
datatypes be different to numeric-equals?  On the other side, what's the 
implementation impact? Does it apply to everyone?

> and the
> implementaion complexity of handling datatype objects specially is not
> small. Also if we requre that triple expression do datatype manipulation
> then we may require some other means to explicitly turn it off. e.g.
> 1.00000000001f != 1.
> 
> op:numeric-equal doesnt descuss floating point formats, the result of
> op:numberic-equal(a, b) where a and b are fp should probably be undefined,
> with formats like IEEE-768 is is not really posibly to answer, and without
> requiring SPARQL implementations to provide thier own software fp bit
> opertations we cant sensibly require a given behaviour. Each VM/processor
> will give different results (if its even supported).

I am proposing that the semantics of the operations would be as defined in 
XPath/Xquery Functions and Operators (F&O).

F&O includes NaNs, -INF, +INF, -0 and +0 from IEEE 754-1985.  Databases support 
this don't they?


My reading was that op:numberic-equal applies to floats and doubles.

[[
6.3.1 op:numeric-equal
op:numeric-equal($arg1 as numeric, $arg2 as numeric) as xs:boolean

Summary: Returns true if and only if the value of $arg1 is equal to the value of 
$arg2. For xs:float and xs:double values, positive zero and negative zero 
compare equal. NaN does not equal itself.
]]

(hmm There is no "not-equal" but NaN is not equal nor not not-equal to itself.)

> 
> I forget the situation with <= and >=, but I think they may be OK.
> 
> fn:round-half-to-even() is pretty obscure, and may not be provided by many
> environments.

One down.

> 
> ditto with some of the Functions on Strings: fn:substring-before,
> fn:substring-after, 

I belive it helps in optimization, else its regular expression tests for these.

 > fn:string-join (seems like an array operator to me),
> fn:normalize-space, fn:normalize-unicode (ouch!),  fn:escape-uri (hard to
> define as many systems with want to use the underlying URL escaping
> features of thier enviroment, not make one to SPARQL spec).

Manipulations could go.

(There are no operations in rq23/ that involve XML sequences by the way).

> 
> fn:matches states perl5 regex, but that seems a bit onerous for ssytems
> that are eg. based on JavaScript, building a complete perl5 regex engine
> seems like too much work. POSIX would be more reasonable IMHO.

We should go for whatever XML Schema datatypes goes for.

[[ F&O:
The regular expression syntax used by these functions is defined in terms of the 
regular expression syntax specified in XML Schema (see [XML Schema Part 2: 
Datatypes]), which in turn is based on the established conventions of languages 
such as Perl. However, because XML Schema uses regular expressions only for 
validity checking, it omits some facilities that are widely-used with languages 
such as Perl. This section, therefore, describes extensions to the XML Schema 
regular expressions syntax that reinstate these capabilities.
]]

I haven't looked recently - what are the differences here?

> 
> isBound seems like an unneccesary depature from SQLs IS NULL, but I'm not
> that bothered, just mentioning it.

As discussed on IRC, there are no nulls to help with the issues around comparing 
nulls.  An SQL implementation may choose to use SQL NULLs internally if that 
gets the right answers.  Defining in terms of NULLs for non-SQL engines is 
unnnecessary.

	Andy

> 
> - Steve
>
Received on Wednesday, 13 October 2004 17:50:01 UTC