Re: Preliminary material for section "value testing" from Steve Harris on 2004-10-13 (public-rdf-dawg@w3.org from October to December 2004)

From: Steve Harris <S.W.Harris@ecs.soton.ac.uk>
Date: Wed, 13 Oct 2004 23:25:04 +0100
To: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-ID: <20041013222504.GA16329@login.ecs.soton.ac.uk>
On Wed, Oct 13, 2004 at 06:49:14 +0100, Andy Seaborne wrote:
> >Agree about the oddity of having datatype pricessing in one place and not
> >the other being potentially confusing, but they are different,
> 
> They *may* be different - that is the decision we have to make!

Syntaxically they are different, wether they are or not semantically.
 
> On one side (user/application centric), Why should pattern matching of XML 
> datatypes be different to numeric-equals?  On the other side, what's the 
> implementation impact? Does it apply to everyone?
> 
> >and the
> >implementaion complexity of handling datatype objects specially is not
> >small. Also if we requre that triple expression do datatype manipulation
> >then we may require some other means to explicitly turn it off. e.g.
> >1.00000000001f != 1.
> >
> >op:numeric-equal doesnt descuss floating point formats, the result of
> >op:numberic-equal(a, b) where a and b are fp should probably be undefined,
> >with formats like IEEE-768 is is not really posibly to answer, and without
> >requiring SPARQL implementations to provide thier own software fp bit
> >opertations we cant sensibly require a given behaviour. Each VM/processor
> >will give different results (if its even supported).
> 
> I am proposing that the semantics of the operations would be as defined in 
> XPath/Xquery Functions and Operators (F&O).
> 
> F&O includes NaNs, -INF, +INF, -0 and +0 from IEEE 754-1985.  Databases 
> support this don't they?

What about denormals? Also there are several NaNs in IEEE, though I dont
think RDF can represent those anyway. In practical terms, determining if
one float is equal to another in any meaningful sense is extremely hard.
(SQL) databases do offer == on floats, but I dont think the result is any
more meaningful than in C, FORTRAN, or most other systems. I dont have a
copy of the SQL '92 spec to check though. There is a general note about
the difficulties of comparing floats here:
http://www.cygnus-software.com/papers/comparingfloats/comparingfloats.htm

Its fine to allow it (as C does), as long as the result is undefined. Java
attepted to provide exact reproducability between architectures, with
notable failure, c.f http://www.cs.berkeley.edu/~wkahan/JAVAhurt.pdf
 
> My reading was that op:numberic-equal applies to floats and doubles.

Its no problem to allow it as long as you never test or specify it :)
 
> >ditto with some of the Functions on Strings: fn:substring-before,
> >fn:substring-after, 
> 
> I belive it helps in optimization, else its regular expression tests for 
> these.

OK, that seems reasonable. I still have a preference for fewer functions
rather than more though.

> > fn:string-join (seems like an array operator to me),
> >fn:normalize-space, fn:normalize-unicode (ouch!),  fn:escape-uri (hard to
> >define as many systems with want to use the underlying URL escaping
> >features of thier enviroment, not make one to SPARQL spec).
> 
> Manipulations could go.
> 
> (There are no operations in rq23/ that involve XML sequences by the way).

OK, my reading of some of them was that they did, but I dont understand
the XQuery vocab.
 
> >fn:matches states perl5 regex, but that seems a bit onerous for ssytems
> >that are eg. based on JavaScript, building a complete perl5 regex engine
> >seems like too much work. POSIX would be more reasonable IMHO.
> 
> We should go for whatever XML Schema datatypes goes for.
> 
> [[ F&O:
> The regular expression syntax used by these functions is defined in terms 
> of the regular expression syntax specified in XML Schema (see [XML Schema 
> Part 2: Datatypes]), which in turn is based on the established conventions 
> of languages such as Perl. However, because XML Schema uses regular 
> expressions only for validity checking, it omits some facilities that are 
> widely-used with languages such as Perl. This section, therefore, describes 
> extensions to the XML Schema regular expressions syntax that reinstate 
> these capabilities.
> ]]
> 
> I haven't looked recently - what are the differences here?

Perl 5 has a non-greedy operator form, eg (.*?), its very useful, but adds
a lot to the complexity of the engine, as I understand it. POSIX just
seems like a more stable standard to me than whatever perl 5 happens to
say this month. I can see the value of aligning with XML Schema though.
 
> >isBound seems like an unneccesary depature from SQLs IS NULL, but I'm not
> >that bothered, just mentioning it.
> 
> As discussed on IRC, there are no nulls to help with the issues around 
> comparing nulls.  An SQL implementation may choose to use SQL NULLs 
> internally if that gets the right answers.  Defining in terms of NULLs for 
> non-SQL engines is unnnecessary.

It doesnt have to be NULL, it could be any symbol, but I would prefer that
there was one myself. Probably just a matter of taste.

- Steve
Received on Wednesday, 13 October 2004 22:25:08 UTC