[#valueTesting] equality and unknown dataypes

rdf:seeAlso <http://lists.w3.org/Archives/Public/public-rdf-dawg/2006AprJun/0047>

and the related companion message to this one about xsd:date's


==== Situation

Consider the tests:

   1 != "a"^^:utype
   "A"^^:utype = "a"^^:utype

for some currently-unknown-to-the-processor :utype.

Now suppose the processor learns about :utype, and now knows that the lexical 
for "a" in datatype :utype is mapped to the value 1.


== Design 1 (interpretation of the current doc)

The current design is based on a fixed set of datatypes that the processor 
knows about for the syntax for "=", "<" etc  This also affects ORDER BY.

!= is defined to be not(=)
= is chosen based on the operators table in 11.4.

   1 != "a"^^:utype
   not(1 = "a"^^:utype)
   not(rdfterm-equals(1  "a"^^:utype))
   not(false)
   true

When the processor learns about the datatype :utype, the test remains true 
because rq23 specifically enumerates the types understood to be the XSD types:

http://www.w3.org/2001/sw/DataAccess/rq23/#operandDataTypes
     * xsd:string
     * xsd:double
     * xsd:float
     * xsd:decimal
     * xsd:integer
     * xsd:boolean
     * xsd:dateTime

together with the XSD datatype promotion rules (so all the types xsd:byte, 
xsd:short, xsd:int, xsd:unsignedByte etc).

Any other type can be tested by RDFterm-equals, at least, so there is a 
catch-all for "=" and "!=".

Adding knowledge about :utype does not change the sequence above : "=" is 
always going to be RDFterm-equal and ("A"^^:utype = "a"^^:utype) is false even 
is the lexical-value mapping is maps "a" and "A" to the same value.

But it is not possible to ORDER BY new datatypes because "<" is not extensible.


== Design 2 : Value testing where possible, default to RDFterm testing

This is an incremental change: the design principle is that testing is 
by-value wherever possible, but if it is not possible, RDFterm-equals is used. 
  This can be applied to new datatypes that are outside the core set.

It can also be applied to "<"  and then ORDER BY sorting by non-core datatypes 
is possible.

If we wish to keep monotonicity, there needs to be a RDFterm-notEquals test 
that is different from not(RDFterm-equals).

   1 != "a"^^:utype

is false (it is not positively known that they are different values).


== Design 3 : Value testing where possible, default to error

Another suggestion was testing by-value for "=", "!=" and "<", with an error 
on cases where the process had no clue.

   1 != "a"^^:utype

is an error if :utype is unknown.  Only

   "a"^^:utype = "a"^^:utype  ==> true
   "a"^^:utype != "a"^^:utype ==> false

because in these cases the process does know these to be correct.


== Design 4 : different operators

Specific and different operators for RDFterm-equals and RDFterm-notEquals so 
that "=" / "!=" only work on known datatypes otherwise there is an error.

Problem:

    FILTER (?x = <http:/example/>)

no longer works (it's a operator type error) and will reject the solution. 
This will be very confusing as well as a significant change.

We could overload "=" to test on IRIs and blank nodes but there is a still the 
problem that

    FILTER (?x = "xyz"^:uType)

is illegal when everything else is legal.

Also:

   FILTER (RDFterm-equal(?x, 1))

does not work on values at all so:

   RDFterm-equal("1"^^xsd:byte, "1"^xsd:integer)

is false (the graph match would be true if some D-entailment applies) which is 
confusing.  There is now no universal "do the best" kind of operation.

(this is a case for a IRI for every function so that the query can explicitly 
call RDFterm-equal).


== Preference

My preference is design 2 or 3 because

1/ it opens up sorting by datatypes outside the core set.
2/ it opens up the possibility of
   "a"@en < "b"@en


== Other

Aside - please can be add an "=" and "!=" for xsd:string.
and explain how "=" works for literals with language tags.

The fact that "a"@en < "b"@en is a type error is opaque.  At least mention the 
idiom str("a"@en) < str("b"@en).

Received on Wednesday, 7 June 2006 12:53:58 UTC