- From: Simon Raboczi <raboczi@tucanatech.com>
- Date: Tue, 6 Jul 2004 03:09:43 -0400
- To: public-rdf-dawg@w3.org
Consider the following use case: "List everyone in my address book and
their email addresses". For the moment assume everyone in my address
book has a name. I'm only trying to demonstrate optional match on the
email address field for the moment. Imagine my address book contains
the following graph:
_:ann :name "Ann"
_:bob :name "Bob"
_:bob :email <bob@work.com>
_:carl :name "Carl"
_:carl :email <carl@work.com>
_:card :email <carl@school.edu>
The symbols _:ann, _:bob, and _:carl are bnodes. Optional match is
motivated by my desire to see Ann listed as part of my address book
even though she has no email account. In a Squish-style language, the
naive query:
SELECT ?name ?email
WHERE ( ?person :name ?name )
( ?person :email ?email )
would not match Ann. The approaches a query language could take
include:
(1) Make more than one query.
This is a "do nothing" option which we could use if 3.6 Optional Match
isn't part of our query language. We leave it to the application to
ask first for the names in the address book:
SELECT ?name
WHERE (?person :name ?name)
getting the answer ?name = "Ann" or "Bob" or "Carl", and then for each
of these answers we query again for the email addresses:
SELECT ?email
WHERE (?person :name "Ann")
(?person :email ?email)
SELECT ?email
WHERE (?person :name "Bob")
(?person :email ?email)
SELECT ?email
WHERE (?person :name "Carl")
(?person :email ?email)
If my address book has to be accessed over a network, the multiple
queries will introduce latency problems that grow as my address book
does. (Parenthetically, I don't consider
SELECT ?name ?email
WHERE (?person :name ?name)
(?person :email ?email)
to be equivalent to the three secondary queries. I don't know Bob's
email address after making this query; I'd have to search through the
results looking for the ?name "Bob" myself to discover it.)
(2) Outer join
This is the approach taken by SeRQL[1] and BRQL[2], and well-documented
because of its use in SQL.
SELECT ?name ?email
WHERE (?person :name ?name)
OPTIONALLY (?person :email ?email)
As a logical variable binding expression, this would produce
(?name = "Ann") or
(?name = "Bob" and ?email = <bob@work.com>) or
(?name = "Carl" and ?email = <carl@work.com>) or
(?name = "Carl" and ?email = <carl@school.com)
The fact that the first disjunction term (?name = "Ann") is independent
of the value of the ?email variable means that this expression doesn't
fit perfectly into a tabular form, and must be padded with a
metasyntactic "null" value. In this case the null is quite
well-defined to mean "independent of the column variable".
?name ?email
+--------+-------------------+
| "Ann" | (null) |
+--------+-------------------+
| "Bob" | <bob@work.com> |
+--------+-------------------+
| "Carl" | <carl@work.com> |
+--------+-------------------+
| "Carl" | <carl@school.edu> |
+--------+-------------------+
(3) Embedded queries
This is basically the approach of (1), except with the secondary
queries included with the primary query. It's used by iTQL[3], and
presuming that the XQuery "return" clause permits further queries to
embedded within it, I believe it can be used by the XQuery-based
languages REX and XsRQL. (Rob, Howard, someone check me on this?)
SELECT ?name ?emails
WHERE (?person :name ?name)
THEN ?emails = (SELECT ?email
WHERE (?person :email ?email))
This produces the logical variable binding
(?name = "Ann") or
(?name = "Bob" and ?email = <bob@work.com>) or
(?name = "Carl" and (?email = <carl@work.com> or ?email =
<carl@school.edu>))
and the tabular form
?name ?emails
+--------+-----------------------+
| "Ann" | ?email |
| | +--------+ |
+--------+-----------------------+
| "Bob" | ?email |
| | +----------------+ |
| | | <bob@work.com> | |
| | +----------------+ |
+--------+-----------------------+
| "Carl" | ?email |
| | +-------------------+ |
| | | <carl@work.com> | |
| | +-------------------+ |
| | | <carl@school.edu> | |
| | +-------------------+ |
+--------+-----------------------+
The ?emails column takes tabular rather than atomic values, and Ann's
list of email addresses is a table with zero rows rather than a "null".
The variable binding expressions for the outer join and embedded query
approaches are (as one would hope!) equivalent and can be converted
from one to the other by the distributive law (A and (B or C)) <-> ((A
and B) or (A and C)). Substituting leftwards results in the embedded
query form, reducing the number of terms so that, for example, "Carl"
only appears once. Substituting rightwards results in the outer join
form, which is a simpler two-level sum of products (i.e. no embedded
tables) but at the cost of duplicate fields (like "Carl") and "null"
entries.
(And to remove the simplifying assumption, now imagine that I don't
know Carl's name, and that Bob also goes by the name "Robert". Now you
*really* need optional match to make any sense of things.)
[1] http://www.openrdf.org/doc/users/ch05.html#d0e1031
[2] http://jena.hpl.hp.com/~afs/BRQL.html#Triple_Matching_Features
[3] http://www.kowari.org/193.htm
Received on Tuesday, 6 July 2004 03:10:23 UTC