Re: Not understanding UNION well, or something

Bob DuCharme wrote:
> Lee,
> 
> That first one worked great out of the box (untested? I'm impressed) 
> with a few repeated rows, and DISTINCT fixed that.

Great!

> When I first wrote my original query, I picked two directors who were 
> unlikely to have many actors in common for the novelty value of them 
> having something in common, but I wanted to come up with a form of the 
> query that would work for directors who were likely to have actors in 
> common, and when I replaced the directors' names with Woody Allen and 
> Robert Altman, it worked just great. The results even listed the three 
> Mia Farrow/Woody Allen movies along with her one Altman film and Michael 
> Murphy's two Altman movies with his one Allen film.
> 
> At first I didn't understand the role of your movie2 variable, but I 
> think I do now: we're only interested in binding movie1 (and its 
> movieName) if something can be bound to movie2 as well, and movie2's 
> name will come up in a different result row when that movie binds to 
> movie1. Is that correct?

That's correct. That's the reason it's sort of a tricky query: we're 
really asking to find two movies, but only interested in hearing about 
one at a time. Actually, though, that points out an alternative way to 
do the query:

PREFIX movie: <http://data.linkedmdb.org/resource/movie/>
PREFIX dc:    <http://purl.org/dc/terms/>
SELECT ?actorName ?johnwatersMovie ?stevenspielbergMovie WHERE {

   # bind the two directors
   ?jw movie:director_name "John Waters" .
   ?ss movie:director_name "Steven Spielberg" .

   # we want to find two movies, one by each director

   ?m1 movie:director ?jw ;
     dc:title ?johnwatersMovie ;
     movie:actor ?actor .
   ?m2 movie:director ?ss ;
     dc:title ?stevenspielbergMovie ;
     movie:actor ?actor .

   ?actor movie:actor_name ?actorName .
}

This one is the most straightforward - just find pairs of movies by JW 
and SS and return the pair. The drawback of this approach is that for 
directors and actors with a large number of common movies, the result 
set is going to have one row for every pair of movies that the 
particular actor was in for the two directors. (Not sure if that makes 
sense.)

(Instead of naming the variables after the directors in question like I 
did here, you can project out ?movieName1 ?dirName1 ?movieName2 
?dirName2 -- the only change to the query is to add in triple patterns 
that find the directors' names explicitly as in our first query:

   ?jw movie:director_name ?dirName1 .
   ?ss movie:director_name ?dirName2 .


> Because of what you wrote about a movie with multiple directors, I tried 
> Woody Allen and Francis Ford Coppola, because besides having actors in 
> common (Diane Keaton, Joe Mantegna, others) they both directed parts of 
> "New York Stories" along with Martin Scorcese. With this being a 
> multiple-director movie that Allen also acted in, he was therefore "in" 
> a Coppola movie. The query results ended up listing every movie Allen 
> acted in, with two entries for "What's Up Tiger Lily" because it too had 
> two directors, crediting Allen and the Japanese guy who directed the 
> original movie that Allen redubbed.
> 
> I didn't try your second query because it looked like it was asking for 
> so much before filtering that I worried that it asked too much of the 
> server (and because your first version worked so well!), although I 
> could be misunderstanding how typical SPARQL query processing works.

In practice, for many SPARQL engines I know about, you are right that 
that would not be a particularly efficient way to write the query. In 
theory, there is no reason that a query engine should not be smart 
enough to optimize those filters into the query evaluation though, and 
thus avoid having to postprocess a large number of results.

I'd be interested to hear from any implementors who are doing that - 
what filters can you push into your underlying query / index search 
strategy, and how does using a FILTER compare (performance-wise) to 
using a straight-up triple pattern that matches against a literal value?

Lee

> thanks,
> 
> Bob
> 
> Lee Feigenbaum wrote:
>> To summarize from the blog post, the goal is to find all actors that 
>> appear in both a John Waters and a Steven Spielberg film. But now you 
>> also want to find all the movies (by those directors) that the actor 
>> was in. If my understanding is right, this is how I would go about it 
>> (untested):
>>
>> PREFIX movie: <http://data.linkedmdb.org/resource/movie/>
>> PREFIX dc:    <http://purl.org/dc/terms/>
>> SELECT ?actorName ?dirName ?movieName WHERE {
>>   # bind the two directors
>>   ?jw movie:director_name "John Waters" .
>>   ?ss movie:director_name "Steven Spielberg" .
>>
>>   # we want to find two movies, one by each director
>>   # but we use two variables so that we can only
>>   # pull out the name of one of them
>>   {
>>     ?movie1 movie:director ?jw .
>>     ?movie2 movie:director ?ss .
>>   } UNION {
>>     ?movie2 movie:director ?jw .
>>     ?movie1 movie:director ?ss .
>>   }
>>
>>   ?movie1 dc:title ?movieName .
>>
>>   # the actor needs to be in both movies
>>   ?movie1 movie:actor ?actor .
>>   ?movie2 movie:actor ?actor .
>>
>>   ?actor movie:actor_name ?actorName .
>>
>>   # we need to repeat the director information
>>   # to be able to bind a variable to the director's
>>   # name - this may give extra results if a movie has
>>   # multiple directors
>>   ?movie1 movie:director [ movie:director_name ?dirName ] .
>> }
>>
>> There may yet be a more elegant way to do this, and I'm not positive 
>> I've got this right. I think that ARQ at least has a LET assignment 
>> operator that would avoid the need for the extra director binding 
>> there at the end.
>>
>> Another way to approach this that is quite similar but might be 
>> considered easier would be to use FILTERs:
>>
>> PREFIX movie: <http://data.linkedmdb.org/resource/movie/>
>> PREFIX dc:    <http://purl.org/dc/terms/>
>> SELECT DISTINCT ?actorName ?dirName1 ?movieName WHERE {
>>
>>   # this is the info we want to pull out
>>   ?movie1 dc:title ?movieName .
>>   ?actor movie:actor_name ?actorName .
>>
>>   # two movies with two directors
>>   ?movie1 movie:director [ movie:director_name ?dirName1 ].
>>   ?movie2 movie:director [ movie:director_name ?dirName2 ]
>>
>>   # the same actor needs to be in both movies
>>   ?movie1 movie:actor ?actor .
>>   ?movie2 movie:actor ?actor .
>>
>>   # use the filter to check the director names
>>   FILTER(
>>     (?dirName1 = 'John Waters' && ?dirName2 = 'Steven Spielberg') ||
>>     (?dirName2 = 'John Waters' && ?dirName1 = 'Steven Spielberg')
>>   ) .
>> }
>>
>> This second one avoids some of the silliness since it relies on the 
>> fact that each actor for which this works will match the pattern two 
>> ways (one way with JW bound to ?dirName1 and one with SS  bound to 
>> it). We want to get *both* these results.
>>
>> In practice, I'd usually do something like this with multiple queries 
>> - first find the relevant actors, then find their movies by JW and by SS.
>>
>> hope this is helpful,
>> Lee
>>
>> Bob DuCharme wrote:
>>>
>>> I'm trying to expand the query shown at 
>>> http://www.snee.com/bobdc.blog/2008/11/sparql-at-the-movies.html#id203668 
>>> to include the director and movie names in the result of the query 
>>> sent to http://data.linkedmdb.org/sparql. I guess my main problem is 
>>> trying to understand how I can set it up so that ?actor is bound to 
>>> the same value throughout the query but ?movie and ?movieName can be 
>>> bound to different values in the two patterns. I know that one actor 
>>> was in a single movie by each of the two directors named below, and 
>>> while the following doesn't give me an error when submitted it gives 
>>> an unrelated set of data. I may be going about it completely wrong.
>>>
>>> Any suggestions?
>>>
>>> thanks,
>>>
>>> Bob
>>>
>>> ####################
>>>
>>> SELECT ?actorName ?dirName ?movieName WHERE {
>>>
>>>  ?dir1 <http://data.linkedmdb.org/resource/movie/director_name> "John 
>>> Waters".
>>>  ?dir2 <http://data.linkedmdb.org/resource/movie/director_name> 
>>> "Steven Spielberg".
>>>
>>>  ?actor  <http://data.linkedmdb.org/resource/movie/actor_name> 
>>> ?actorName.
>>>
>>>   {
>>>     ?movie <http://data.linkedmdb.org/resource/movie/director> ?dir1;
>>>            <http://data.linkedmdb.org/resource/movie/actor> ?actor;
>>>            <http://purl.org/dc/terms/title> ?movieName.
>>>     ?dir1 <http://data.linkedmdb.org/resource/movie/director_name> 
>>> ?dirName.
>>>   }
>>>
>>>   UNION
>>>
>>>   {
>>>     ?movie <http://data.linkedmdb.org/resource/movie/director> ?dir2;
>>>            <http://data.linkedmdb.org/resource/movie/actor> ?actor;
>>>            <http://purl.org/dc/terms/title> ?movieName.
>>>     ?dir2 <http://data.linkedmdb.org/resource/movie/director_name> 
>>> ?dirName.
>>>   }
>>> }
>>>
>>>
> 
> 

Received on Thursday, 13 November 2008 03:09:22 UTC