Test cases combining MINUS and NOT EXISTS from Andy Seaborne on 2010-07-24 (public-rdf-dawg@w3.org from July to September 2010)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Sat, 24 Jul 2010 20:34:07 +0100
To: SPARQL Working Group WG <public-rdf-dawg@w3.org>
Message-ID: <4C4B402F.8030406@epimorphics.com>
I've added test cases to the negation test suite after I was asked how
one might query for subset relationships. The queries calculate which
sets are subsets of other sets in the data, look for set quality and
find proper subsets.

There are other ways to achieve the tasks, and probably lost of better
ways but these provide combinations of MINUS and NOT EXISTS in the same
query.  The tests do illustrate the different styles of negation for
MINUS and NOT EXISTS.

Style point: I've committed SPARQL XML Results format files.  Does
anyone mind if we use the JSON format as well?

	Andy

Data:
------
:a rdf:type :Set .
:a :member 1 .
:a :member 2 .
:a :member 3 .

:b rdf:type :Set .
:b :member 1 .
:b :member 9 .

:c rdf:type :Set .
:c :member 1 .
:c :member 2 .

:d rdf:type :Set .
:d :member 1 .
:d :member 9 .

:e rdf:type :Set .
:e :member 1 .
:e :member 2 .

:empty rdf:type :Set .
-------
	:b,:d and :c,:e have the same members.
	:c and :e are proper subsets of :a

Having "rdf:type :Set" makes it easy to find the sets.  Otherwise, we'd
need subqueries: { SELECT DISTINCT ?s { :s :member [] } } and the empty
set would not be there.


query for subset-01:
Find subsets, include the case of same elements.

Could eliminate the case of (A,B) and (B,A) by
    FILTER(str(?s1) < str(?s2))
but that only works because the data uses IRIs, not bnodes.



------
PREFIX :    <http://example/>
PREFIX  rdf:    <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
# SPARQL 1.1
SELECT (?s1 AS ?subset) (?s2 AS ?superset)
WHERE
{
    # All pairs of sets
    ?s2 rdf:type :Set .
    ?s1 rdf:type :Set .
    FILTER(?s1 != ?s2)
    MINUS
    {
    	# The MINUS RHS is (?s1, ?s2) where ?s1 has a member not in ?s2
        # All pairs of sets ...

        ?s1 rdf:type :Set .
        ?s2 rdf:type :Set .

        ?s1 :member ?x .
        FILTER NOT EXISTS { ?s2 :member ?x . }
    }
}

------

query for subset-02:
Look for only proper subsets.
Imperfect - it does not do set equality testing, so it leaves in (:b,:d)
(:c,:e) and the reverse pairs but it does use MINUS twice and a ||
FILTER with NOT EXISTS

Because this uses "?s1 :member ?x" it assumes that ?s1 does have at
least member, so (:emptyset, :emptyset) is left in the answers.

The second MINUS eliminates that case by testing existence with a free
variable, unlike the previous part of the query.

------
PREFIX :    <http://example/>
PREFIX  rdf:    <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT (?s1 AS ?subset) (?s2 AS ?superset)
WHERE
{
    # All pairs of sets
    ?s2 rdf:type :Set .
    ?s1 rdf:type :Set .
    MINUS {
        ?s1 rdf:type :Set .
        ?s2 rdf:type :Set .
        # Assumes ?s1 has at least one member
        ?s1 :member ?x .
        # If we want to exclude A as a subset of A.
        # This is not perfect as "?s1 = ?s2" is not a
	# contents based comparison.
        FILTER ( ?s1 = ?s2 || NOT EXISTS { ?s2 :member ?x . } )
    }
    MINUS {
        # If we don't want the empty set being a subset of itself.
        ?s1 rdf:type :Set .
        ?s2 rdf:type :Set .
        # Choose the pair (empty set, empty set)
        FILTER ( NOT EXISTS { ?s1 :member ?y . } )
        FILTER ( NOT EXISTS { ?s2 :member ?y . } )
        }
}
------

query for set-equality-1.rq
Find all pairs that are different resources but the set has the same
members.

----
PREFIX :    <http://example/>
PREFIX  rdf:    <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

# Find sets that have exactly the same members.
# Find all (s1,s2) such that (s1 subset of s) and (s2 subset of s1).

SELECT DISTINCT ?s1 ?s2
WHERE
{
    ?s2 rdf:type :Set .
    ?s1 rdf:type :Set .
    FILTER(str(?s1) < str(?s2))
    MINUS
    {
        ?s1 rdf:type :Set .
        ?s2 rdf:type :Set .
        ?s1 :member ?x .
        FILTER NOT EXISTS { ?s2 :member ?x . }
    }
    MINUS
    {
        ?s1 rdf:type :Set .
        ?s2 rdf:type :Set .
        ?s2 :member ?x .
        FILTER NOT EXISTS { ?s1 :member ?x . }
    }
}
----

query for subset-03:
Putting this all together, combining subset-01 and set-equality-1.rq, we
get use of nested MINUS.  There must be a better way.

----
PREFIX :    <http://example/>
PREFIX  rdf:    <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT (?s1 AS ?subset) (?s2 AS ?superset)
WHERE
{
    # All pairs of sets except (S,S)
    ?s2 rdf:type :Set .
    ?s1 rdf:type :Set .
    MINUS {
        # See subset-01 ...
        ?s1 rdf:type :Set .
        ?s2 rdf:type :Set .
        ?s1 :member ?x .
        FILTER ( NOT EXISTS { ?s2 :member ?x . } )
    }
    # Remove those that are the pairs with the same elements.
    # See set-equals-1
    MINUS {
        ?s2 rdf:type :Set .
        ?s1 rdf:type :Set .
        MINUS
        {
            ?s1 rdf:type :Set .
            ?s2 rdf:type :Set .
            ?s1 :member ?x .
            FILTER NOT EXISTS { ?s2 :member ?x . }
        }
        MINUS
        {
            ?s1 rdf:type :Set .
            ?s2 rdf:type :Set .
            ?s2 :member ?x .
            FILTER NOT EXISTS { ?s1 :member ?x . }
        }
    }
}
----
Received on Saturday, 24 July 2010 19:34:30 UTC