Re: Proposed definition of ExprMultiSet from Andy Seaborne on 2010-03-08 (public-rdf-dawg@w3.org from January to March 2010)

From: Andy Seaborne <andy.seaborne@talis.com>
Date: Mon, 08 Mar 2010 15:57:32 +0000
To: Steve Harris <steve.harris@garlik.com>
CC: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <4B951E6C.1030402@talis.com>
On 08/03/2010 2:20 PM, Steve Harris wrote:
> On 7 Mar 2010, at 22:57, Andy Seaborne wrote:
>
>> Overall - we seem to have the start of a possible design and so this
>> message is about details.
>>
>> On 07/03/2010 9:33 PM, Steve Harris wrote:
>>> On 7 Mar 2010, at 17:42, Andy Seaborne wrote:
>>>
>>>> ISSUE-53
>>>>
>>>> I propose the following to define ExprMultiSet:
>>>>
>>>> -------
>>>>
>>>> Let Ω be a partition.
>>>>
>>>> ExprMultiSet(Ω) =
>>>> { eval(expr,μ) | μ in Ω such that eval(μ(expr)) is defined }
>>>> UNION
>>>> { e | μ in Ω such that eval(μ(expr)) is undefined }
>>>>
>>>> where "e" is some symbol that is distinct from all RDF terms.
>>>>
>>>> card[x]:
>>>> if DISTINCT:
>>>> card[x] = 1 if there exists μ in Ω such that x = eval(μ(expr))
>>>> card[x] = 0 otherwise
>>>> else
>>>> card[x] = count of μ in Ω such that x = eval(μ(expr))
>>>
>>> I find the reuse of the term ExprMultiset as a function very confusing,
>>> but I think I understand the proposal.
>>
>> It's just trying to write the ExprMultiset based on Ω for which there
>> is no notation. I suppose it should involve μ. It only about whether
>> you like to write definitions with free terms or not.
>>
>> "ExprMultiset based on Ω, expr = ... "
>
> I believe that was handled in the definition of Aggregation()
> previously, but possibly there's some term missing.
>

ExprMultiset appears first in:

[[
Aggregation(GroupClause, ExprMultiset, func, Ω) =

    { merge(k, func( { μ'(exp) | exp in ExprMultiset, μ' in Ω' } ) | (k, 
Ω') in Partition(GroupClause, Ω) }
]]

but also

[[
  If this keyword is present then any
duplicate values in exp · μ' are removed, effectively making
ExprMultiset a set.
]]

It seems to be used both as a set of expressions, and also as the 
results after evaluation.

I'm giving a name+definition to the thing that is the outcome of 
evaluating the expression over a multiset of solutions.  This is then 
used on a partition.

(which reminds me - there is only one expression in an aggregate in the 
stardard ones but it's not an impossible to think of n-ary ones e.g.
min-distance(point1, point2))


Some definitions:
   rough wording - needs refining
   only copes with one expression
   does not deal with aggregator parameters

--------
Defn: ExprValueMultiSet

An ExprValueMultiSetis the multi set formed by evaluating the an 
expression for each solution bind of a multiset of solutions.

ExprValueMultiSet of expr and Ω =
   { eval(expr,μ) | μ in Ω such that eval(μ(expr)) is defined }
   UNION
   { e | μ in Ω such that eval(μ(expr)) is undefined }

where "e" is some symbol that is distinct from all RDF terms.

card[x]:
   if DISTINCT:
     card[x] = 1 if there exists μ in Ω such that x = eval(μ(expr))
     card[x] = 0 otherwise
   else
     card[x] = count of μ in Ω such that x = eval(μ(expr))

--------

Defn: Aggregation

An aggregation is the multiset of solutions forms by aggregating the 
solutions in a partition, for each partition in a group, together with 
the key for the group:

Aggregation(GroupClause, ExprValueMultiSet, func, var, Ω) =

    { merge(k, (var, func( EVMS ) )
      | EVMS is the ExprValueMultiSet of (expr, Ω' )
        (k, Ω') in Partition(GroupClause, Ω) }

card[x] = 1 if x in Aggregation else 0
# The key is different in each row so there will be no duplicates.

--------

I added "var" as the variable being bound to the aggregate value but 
also we have situations where it is no variable name given.

Maybe a definition that just defines the value is better and leave the 
binding and merging the place the aggregate is used.  That's also not 
straight forward to align the values for each partition with the keys.

 Andy
Received on Monday, 8 March 2010 15:58:05 UTC