- From: Simon Raboczi <raboczi@tucanatech.com>
- Date: Tue, 6 Jul 2004 03:09:43 -0400
- To: public-rdf-dawg@w3.org
Consider the following use case: "List everyone in my address book and their email addresses". For the moment assume everyone in my address book has a name. I'm only trying to demonstrate optional match on the email address field for the moment. Imagine my address book contains the following graph: _:ann :name "Ann" _:bob :name "Bob" _:bob :email <bob@work.com> _:carl :name "Carl" _:carl :email <carl@work.com> _:card :email <carl@school.edu> The symbols _:ann, _:bob, and _:carl are bnodes. Optional match is motivated by my desire to see Ann listed as part of my address book even though she has no email account. In a Squish-style language, the naive query: SELECT ?name ?email WHERE ( ?person :name ?name ) ( ?person :email ?email ) would not match Ann. The approaches a query language could take include: (1) Make more than one query. This is a "do nothing" option which we could use if 3.6 Optional Match isn't part of our query language. We leave it to the application to ask first for the names in the address book: SELECT ?name WHERE (?person :name ?name) getting the answer ?name = "Ann" or "Bob" or "Carl", and then for each of these answers we query again for the email addresses: SELECT ?email WHERE (?person :name "Ann") (?person :email ?email) SELECT ?email WHERE (?person :name "Bob") (?person :email ?email) SELECT ?email WHERE (?person :name "Carl") (?person :email ?email) If my address book has to be accessed over a network, the multiple queries will introduce latency problems that grow as my address book does. (Parenthetically, I don't consider SELECT ?name ?email WHERE (?person :name ?name) (?person :email ?email) to be equivalent to the three secondary queries. I don't know Bob's email address after making this query; I'd have to search through the results looking for the ?name "Bob" myself to discover it.) (2) Outer join This is the approach taken by SeRQL[1] and BRQL[2], and well-documented because of its use in SQL. SELECT ?name ?email WHERE (?person :name ?name) OPTIONALLY (?person :email ?email) As a logical variable binding expression, this would produce (?name = "Ann") or (?name = "Bob" and ?email = <bob@work.com>) or (?name = "Carl" and ?email = <carl@work.com>) or (?name = "Carl" and ?email = <carl@school.com) The fact that the first disjunction term (?name = "Ann") is independent of the value of the ?email variable means that this expression doesn't fit perfectly into a tabular form, and must be padded with a metasyntactic "null" value. In this case the null is quite well-defined to mean "independent of the column variable". ?name ?email +--------+-------------------+ | "Ann" | (null) | +--------+-------------------+ | "Bob" | <bob@work.com> | +--------+-------------------+ | "Carl" | <carl@work.com> | +--------+-------------------+ | "Carl" | <carl@school.edu> | +--------+-------------------+ (3) Embedded queries This is basically the approach of (1), except with the secondary queries included with the primary query. It's used by iTQL[3], and presuming that the XQuery "return" clause permits further queries to embedded within it, I believe it can be used by the XQuery-based languages REX and XsRQL. (Rob, Howard, someone check me on this?) SELECT ?name ?emails WHERE (?person :name ?name) THEN ?emails = (SELECT ?email WHERE (?person :email ?email)) This produces the logical variable binding (?name = "Ann") or (?name = "Bob" and ?email = <bob@work.com>) or (?name = "Carl" and (?email = <carl@work.com> or ?email = <carl@school.edu>)) and the tabular form ?name ?emails +--------+-----------------------+ | "Ann" | ?email | | | +--------+ | +--------+-----------------------+ | "Bob" | ?email | | | +----------------+ | | | | <bob@work.com> | | | | +----------------+ | +--------+-----------------------+ | "Carl" | ?email | | | +-------------------+ | | | | <carl@work.com> | | | | +-------------------+ | | | | <carl@school.edu> | | | | +-------------------+ | +--------+-----------------------+ The ?emails column takes tabular rather than atomic values, and Ann's list of email addresses is a table with zero rows rather than a "null". The variable binding expressions for the outer join and embedded query approaches are (as one would hope!) equivalent and can be converted from one to the other by the distributive law (A and (B or C)) <-> ((A and B) or (A and C)). Substituting leftwards results in the embedded query form, reducing the number of terms so that, for example, "Carl" only appears once. Substituting rightwards results in the outer join form, which is a simpler two-level sum of products (i.e. no embedded tables) but at the cost of duplicate fields (like "Carl") and "null" entries. (And to remove the simplifying assumption, now imagine that I don't know Carl's name, and that Bob also goes by the name "Robert". Now you *really* need optional match to make any sense of things.) [1] http://www.openrdf.org/doc/users/ch05.html#d0e1031 [2] http://jena.hpl.hp.com/~afs/BRQL.html#Triple_Matching_Features [3] http://www.kowari.org/193.htm
Received on Tuesday, 6 July 2004 03:10:23 UTC