Re: limit per resource rethought... from Paul Gearon on 2010-08-11 (public-rdf-dawg@w3.org from July to September 2010)

From: Paul Gearon <gearon@ieee.org>
Date: Wed, 11 Aug 2010 14:28:07 -0400
To: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <AANLkTimFtizJOSRZ4423BBF_qsQyJb5+F87A2=d-Lnt7@mail.gmail.com>
I'm not advocating anything here, but as point of comparison I thought
I'd explain Mulgara's subqueries in it's original query language
(TQL). I'd like to point out that I had nothing to do with this
system!  :-)

Instead of having subqueries in the WHERE clause of the query,
subqueries in TQL were always in the SELECT clause. It wasn't really
like SPARQL's projection, in that these subqueries did not return
scalars, but rather returned a set of tuples, in the same way that a
query returns a set of tuples. (Our output format allowed such
embedding). You could convert something like this into a scalar by
wrapping in a function that took a tuples and returned a scalar, such
as COUNT().

The way this was implemented, was to re-execute the subquery for every
solution being projected. Importantly, any variables in scope in the
outer query were prebound when executing a subquery.

Using this approach, it would be possible to get the information
you're talking about using a query like:

SELECT ?P (
    SELECT ?F
    WHERE ?P :knows ?F
    ORDER BY ?F
    LIMIT 3
)
WHERE ?P a :Person


(That's not exact, since TQL has a compulsory FROM clause, but that's
the only real difference). Notice how the SELECT clause has two
elements. The first is ?P, while the second is unlabeled (obviously,
no one thought to use the "AS" operator). Unlabeled elements were
automatically assigned a variable name. Consequently, the answer had
an unfortunate format:

{?P=:Fred, ?k0={{?F=:Dick}, {?F=:Harry}, {?F=:Tom}}}
{?P=:Sally, ?k0={{?F=:Fred}, {?F=:Mary}, {?F=:Suzie}}}
etc...

Though this would be easy enough to "flatten" using a cross-product approach.

My point here is that this kind of subquery makes the original
question easy to answer. I'd rather see an approach that makes it easy
to answer questions, rather than trying to stick with the existing
system and requiring deep knowledge to bend it to work for you.

That said, I don't think that the working group has time to consider
significant changes like this. It's just an idea to put out there.

Regards,
Paul Gearon

On Wed, Aug 11, 2010 at 12:59 PM, Axel Polleres <axel.polleres@deri.org> wrote:
>> "Give me all persons and the first 3 (alphabetically by name) of their friends"
>
>
> An awkward workaround for that query... I think it could be done, ugly, something like as follows:
>
>  SELECT ?P ?F
>  { ?P a :Person .
>    OPTIONAL { ?P :knows ?F FILTER ( NOT EXISTS {?P :knows ?F1,?F2,?F3 FILTER (?F1 <?F && ?F2 < ? F && ?F3 < ?F ) } )
>  } ORDER BY ?P,?F
>
>
> not yet tested... but BRRRRRRRRRRRRRRRRRRR! ;-\
>
> I don't see a way to do this at all with LIMIT so far...
>
> Axel
>
>
> On 11 Aug 2010, at 11:23, Steve Harris wrote:
>
>> On 2010-08-11, at 10:56, Axel Polleres wrote:
>>
>> > Hi all again,
>> >
>> > in the course of my last mail, I also thought about adding another example:
>> >
>> > E)  "Give me all persons and the first 3 (alphabetically by name) of their friends"
>> >
>> > but I couldn't find any way to write this in an intuitive manner... ideas anyone?
>> > I am afraid, that - by the scoping we have imposed for subqueries - that one might be difficult/impossible?
>> >
>> > The following naive writing  does't work obviously:
>> >
>> >  SELECT ?P ?F
>> >  { ?P a :Person .
>> >    {SELECT ?P ?F { ?P :knows ?F . ?F name ?N } ORDER BY ?N LIMIT 3 }
>> >  }
>> >
>> > I also think I recall that we had this discussion already some time back in some other form,
>> > but I can't recall the outcome :-|
>>
>> This doesn't work because bindings happen bottom up, so the subSELECT is bound before the outer SELECT, I find this counter-intuative, but there are technical reasons why it has to be that way round, as I understand it.
>>
>> Another thing that also doesn't quite work is:
>>
>> SELECT ?p (SAMPLE(?f) AS ?f1) (SAMPLE(?f) AS ?f2) (SAMPLE(?f) AS ?f3)
>> WHERE {
>>    ?p a :Person .
>>    ?p :knows ?f .
>> } GROUP BY ?p
>>
>> The problem is that by the definition of the Sample set function, you get the same binding of ?f each time.
>>
>> We could have a form of SAMPLE() like SAMPLE(?var ; offset=N), which takes the Nth value, rather than the 0th.
>>
>> - Steve
>>
>> PS look, no commas :)
>>
>> --
>> Steve Harris, CTO, Garlik Limited
>> 1-3 Halford Road, Richmond, TW10 6AW, UK
>> +44 20 8439 8203  http://www.garlik.com/
>> Registered in England and Wales 535 7233 VAT # 849 0517 11
>> Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD
>>
>>
>
>
>
Received on Wednesday, 11 August 2010 18:28:37 UTC