Re: Proposed definition of ExprMultiSet from Steve Harris on 2010-03-08 (public-rdf-dawg@w3.org from January to March 2010)

From: Steve Harris <steve.harris@garlik.com>
Date: Mon, 8 Mar 2010 17:15:12 +0000
To: Andy Seaborne <andy.seaborne@talis.com>
Cc: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-Id: <6F2B4ADD-A4A1-4045-8038-D188543180CE@garlik.com>
On 8 Mar 2010, at 15:57, Andy Seaborne wrote:
> On 08/03/2010 2:20 PM, Steve Harris wrote:
>> On 7 Mar 2010, at 22:57, Andy Seaborne wrote:
>>
>>> Overall - we seem to have the start of a possible design and so this
>>> message is about details.
>>>
>>> On 07/03/2010 9:33 PM, Steve Harris wrote:
>>>> On 7 Mar 2010, at 17:42, Andy Seaborne wrote:
>>>>
>>>>> ISSUE-53
>>>>>
>>>>> I propose the following to define ExprMultiSet:
>>>>>
>>>>> -------
>>>>>
>>>>> Let Ω be a partition.
>>>>>
>>>>> ExprMultiSet(Ω) =
>>>>> { eval(expr,μ) | μ in Ω such that eval(μ(expr)) is defined }
>>>>> UNION
>>>>> { e | μ in Ω such that eval(μ(expr)) is undefined }
>>>>>
>>>>> where "e" is some symbol that is distinct from all RDF terms.
>>>>>
>>>>> card[x]:
>>>>> if DISTINCT:
>>>>> card[x] = 1 if there exists μ in Ω such that x = eval(μ(expr))
>>>>> card[x] = 0 otherwise
>>>>> else
>>>>> card[x] = count of μ in Ω such that x = eval(μ(expr))
>>>>
>>>> I find the reuse of the term ExprMultiset as a function very  
>>>> confusing,
>>>> but I think I understand the proposal.
>>>
>>> It's just trying to write the ExprMultiset based on Ω for which  
>>> there
>>> is no notation. I suppose it should involve μ. It only about  
>>> whether
>>> you like to write definitions with free terms or not.
>>>
>>> "ExprMultiset based on Ω, expr = ... "
>>
>> I believe that was handled in the definition of Aggregation()
>> previously, but possibly there's some term missing.
>>
>
> ExprMultiset appears first in:
>
> [[
> Aggregation(GroupClause, ExprMultiset, func, Ω) =
>
>   { merge(k, func( { μ'(exp) | exp in ExprMultiset, μ' in Ω' } ) |  
> (k, Ω') in Partition(GroupClause, Ω) }
> ]]
>
> but also
>
> [[
> If this keyword is present then any
> duplicate values in exp · μ' are removed, effectively making
> ExprMultiset a set.
> ]]
>
> It seems to be used both as a set of expressions, and also as the  
> results after evaluation.

I think rather the comment on the end is just wrong.

> I'm giving a name+definition to the thing that is the outcome of  
> evaluating the expression over a multiset of solutions.  This is  
> then used on a partition.

Right, currently that doesn't have a name

> (which reminds me - there is only one expression in an aggregate in  
> the stardard ones but it's not an impossible to think of n-ary ones  
> e.g.
> min-distance(point1, point2))

It's a Multiset currently, hence the name. e.g. MAX(?x, ?y) w.r.t. ? 
x=1,2 / ?y=3,4 gives MAX({1,3,2,4}).

Lee had some good usecases at the F2F for making it a multiset of  
expressions.

- Steve

> Some definitions:
>  rough wording - needs refining
>  only copes with one expression
>  does not deal with aggregator parameters
>
> --------
> Defn: ExprValueMultiSet
>
> An ExprValueMultiSetis the multi set formed by evaluating the an  
> expression for each solution bind of a multiset of solutions.
>
> ExprValueMultiSet of expr and Ω =
>  { eval(expr,μ) | μ in Ω such that eval(μ(expr)) is defined }
>  UNION
>  { e | μ in Ω such that eval(μ(expr)) is undefined }
>
> where "e" is some symbol that is distinct from all RDF terms.
>
> card[x]:
>  if DISTINCT:
>    card[x] = 1 if there exists μ in Ω such that x = eval(μ(expr))
>    card[x] = 0 otherwise
>  else
>    card[x] = count of μ in Ω such that x = eval(μ(expr))
>
> --------
>
> Defn: Aggregation
>
> An aggregation is the multiset of solutions forms by aggregating the  
> solutions in a partition, for each partition in a group, together  
> with the key for the group:
>
> Aggregation(GroupClause, ExprValueMultiSet, func, var, Ω) =
>
>   { merge(k, (var, func( EVMS ) )
>     | EVMS is the ExprValueMultiSet of (expr, Ω' )
>       (k, Ω') in Partition(GroupClause, Ω) }
>
> card[x] = 1 if x in Aggregation else 0
> # The key is different in each row so there will be no duplicates.
>
> --------
>
> I added "var" as the variable being bound to the aggregate value but  
> also we have situations where it is no variable name given.
>
> Maybe a definition that just defines the value is better and leave  
> the binding and merging the place the aggregate is used.  That's  
> also not straight forward to align the values for each partition  
> with the keys.
>
>  Andy
>
>
>
>

-- 
Steve Harris, Garlik Limited
2 Sheen Road, Richmond, TW9 1AE, UK
+44 20 8973 2465  http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10  
9AD
Received on Monday, 8 March 2010 17:15:41 UTC