Re: trade-offs for equivalence tests from Seaborne, Andy on 2006-08-22 (public-rdf-dawg@w3.org from July to September 2006)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Tue, 22 Aug 2006 18:30:50 +0100
To: Eric Prud'hommeaux <eric@w3.org>
CC: public-rdf-dawg@w3.org
Message-ID: <44EB3F4A.2000002@hp.com>
Eric Prud'hommeaux wrote:
> The concensus during the 8 Aug telecon was that we should have the =
> operator serve for both value equivalence and node equivalence tests:
> 
>   value-eq:  "12.0"^^xsd:float = "12"^^xsd:integer
>   node-eq:   <foo> = <foo>

Just to be clear here:

"=" is a value equivalence on literals (not node equivalence).

Because "sameNode => same-value", it can be read as value-equivalence when:

   "asdf"^^foo:bar = "asdf"^^foo:bar

Datatypes can't map the same lexical to two different values so, even though 
the processor does not know about foo:bar it does know the expression above is 
value-equal.  This is convenient because it means it is falling back to (in 
implementation terms) sameNode here meaning that the backstop "=" is:

sameNode : true => return true
sameNode : false => return error

> The screw case is that one cannot use '=' to test to see if two
> strings of unknown type are *different*:

Checking: Meaning here specifically "node different" for the use case of data 
validation.

Can't test whether two lexical forms are the node-same/different for known 
datatypes either by using "=" or "!="

> 
>   "asdf"^^foo:bar != "qwer"^^foo:bar  => type error
> 
> We need to either accept nonmonotoncity or add an operator to allow
> simple node-eq checking on literals of unsupported types.
 >
 > The
> former seems like a non-starter, so it seems we need an operator for
> the latter. Here are some options and their costs:
> 
>  -- Conservative --
> The syntax for the two operators are entirely distinct; you can't
> write "is this IRI the same as that one?" the same way you write "is
> this value numerically equivalent (value-eq) to that value?"
> 
> Advantages:
> - consistent user experience: operators always do the same thing.
> - more optimizable: node-eq treated just as pointer equivalence;
>   more heavy value-eq only invoked when the user specifically
>   demanded it.
> 
> Disadvantages:
> - you need more operators in mind
> - queries where you want to test *either* node-eq or value-eq need to
>   be specially worded with an ||
> 
> 
>  -- Liberal (outcome of the 8 Aug telecon) --
> The syntax for value-eq does double duty for all the possible (as
> limited by monotonicity) operands; you *can* test for IRI-eq, IRI-ne,
> bNode-eq, bNode-ne and literal-eq the same way you test for numeric
> equivalence. IRIs and bNodes equivalence can be tested with the
> node-eq operator *or* the value-eq operator.

The use of 'numeric' here confuses me.  Can you confirm we are talking about 
all value comparisons, not just numbers?  Dates, temperatures, etc. etc.  Even 
strange things that are only partially defined for "<" or "="

[A strange thing:
"(1, 0*pi)" = "(1, 2pi)"
"(1, pi)" = "(-1, 0*pi)"
for non-normalised (non-standard) polar complex numbers.]

> 
> Advantages:
> - intuitive: for most cases, the one operator does what you need.
> 
> Disadvantages:
> - inconsistent: literal-ne different from the rest.

I need an example of inconsistency here.

   "asdf"^^foo:bar != "qwer"^^foo:bar  => type error
but
   "asdf"^^foo:bar = "qwer"^^foo:bar   => type error
which I see as consistent.


For literals, if "=" is defined as error on unknown unless same-node:

True:   "abc"^^:unknown = "abc"^^:unknown
      (same lexical form means same value regardless)
Error:  "abc"^^:unknown = "def"^^:unknown

and

False:   "abc"^^:unknown != "abc"^^:unknown
Error:   "abc"^^:unknown != "def"^^:unknown

!= is defined as !(=) when using errors.

> - less optimizable: every value-eq test meeds to do both a value test
>   and a node equivalnce.
> 
> Finally, we should figure how to spell these to operators in the
> query. '=' seems like a popular choice for at least one.
> 
>  -- = / sameNode --
>   ?num1 = ?num2 && sameNode(?str1, ?str2)

+1 if we are covering more than just the numeric value space.

True or error: "2006-08-22"^^xsd:date  = "2006-08-22Z"^^xsd:date
   (true if xsd:date is understood, error if not)

false or error: "2006-08-22"^^xsd:date  != "2006-08-22Z"^^xsd:date
false or error: !("2006-08-22"^^xsd:date = "2006-08-22Z"^^xsd:date)

So it's monotonic as the processor learns about xsd:date.

> 
>  -- = / == --
>   ?num1 = ?num2 && ?str1 == ?str2

-1

> 
>  -- == / = --
>   ?num1 == ?num2 && ?str1 = ?str2

-1

> 
> Andy says that having both eq and = in RDQL lead to user confusion.

It's a perl-ism (or test(1)-ism) - people without that perspective were confused.

> Given that we want monotonicity, I don't think we can avoid having two
> operators. The question is infix/function, how to spell them, and how
> liberal to make the value-eq operator.

Could you give examples of where we have any other choices for value-eq on 
literals whil retaining monotonicity?  The only one I see is whether:

   "abc"^^:unknown = "abc"^^:unknown

is true or error and as discussed above it can be true even for unknown 
datatypes using "sameNode => sameValue".

I'll turn these into test cases for my action item - if there are other cases, 
if you note them, I'll cover those as well.

	Andy
Received on Tuesday, 22 August 2006 17:31:42 UTC