UNSAID - two test cases from Seaborne, Andy on 2004-12-22 (public-rdf-dawg@w3.org from October to December 2004)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Wed, 22 Dec 2004 10:26:12 +0000
To: 'RDF Data Access Working Group' <public-rdf-dawg@w3.org>
Message-ID: <41C94BC4.4030409@hp.com>

Two test cases: data, queries, results and manifest.ttl attached.

The first is based on the dawg-comments message, with multiple triples with 
the same property; the second is Steve's "exactly one email address" done 
with UNSAID and also with OPTIONAL.

 Andy


==== Case 1: selecting a resource based on the absence of a type triple:

An example like this has just been asked on jena-dev.

"Find all the things that are not of a given class."

Suppose the data is:

   <x> rdf:type :foo .
   <x> rdf:type :bar .

then it needs something like:

     UNSAID(?thing rdf:type :foo)

Doing this without UNSAID might take two queries and the application needs
to calculate the set difference (find all ?x (Q1), find all ?x which have
type :foo (Q2) and take all ?x in Q2 out of Q1).

Turning into a test case:

== Data 1:
@prefix : <http://example.org/ns#> .

:x a :foo .
:x a :bar .

:y a :bar .

== Query 1:
PREFIX : <http://example.org/ns#>

SELECT ?r
WHERE (?r rdf:type ?type)        # Get all the things with a type
       UNSAID (?r rdf:type :foo)

== Result Set 1:
------
| r  |
======
| :y |
------

==== Case 2:

This is a great test case from Steve:
"find all the people with exactly one known email address"

might be:

SELECT ?person
WHERE
   (?person :email ?e1)
   UNSAID { (?person :email ?e2) AND ?e2 NE ?e1 }

but also optionals+constraints can do it:

PREFIX : <http://example.org/ns#>

SELECT ?person
WHERE  (?person :email ?e1)
    OPTIONAL { (?person :email ?e2) AND ?e2 NE ?e1 }
    AND ! &dawg:bound(?e2)

using the same method Steve had of using optional, then inverting the sense
of the match with dawg:bound.


== Data 2:
@prefix : <http://example.org/ns#> .

:p1 :email <mailto:p1e1> .
:p1 :email <mailto:p1e2> .

:p2 :email <mailto:me> .

== Query 2.1:
PREFIX : <http://example.org/ns#>

SELECT ?person
WHERE  (?person :email ?e1)
        UNSAID { (?person :email ?e2) AND ?e2 NE ?e1 }

== Result Set 2.1
----------
| person |
==========
| :p2    |
----------

((
I was surprised that this worked at all.  On reflection, I can see it done
with something like a subquery in SQL and using NOT IN

See MySQL manual "13.1.8.3 Subqueries with ANY, IN, and SOME"
http://dev.mysql.com/doc/mysql/en/Subqueries.html
))


== Query 2.2:
I haven't implemented functions yet so I had to try this and eyeball the
results:

PREFIX : <http://example.org/ns#>
SELECT ?person ?e1 ?e2
WHERE  (?person :email ?e1)
        OPTIONAL { (?person :email ?e2) AND ?e2 NE ?e1 }

== Result Set 2.2
------------------------------------------
| person | e1            | e2            |
==========================================
| :p1    | <mailto:p1e2> | <mailto:p1e1> |
| :p1    | <mailto:p1e1> | <mailto:p1e2> |
| :p2    | <mailto:me>   |               |
------------------------------------------

so testing for ?e2 unbound would get just :p2

Note the excessive hits on :p1 as there is combinatorial growth by asking
for :email twice.  Adding a third email for :p1 gets 6 rows etc etc.
It could be argued that this is where the cost of a subquery has gone.

 Andy

Attachments

application/x-zip-compressed attachment: unsaid2.zip

Received on Wednesday, 22 December 2004 10:26:46 UTC