Re: ACTION-24: aggregate functions with multiple answers

On 11 May 2009, at 16:47, Seaborne, Andy wrote:
>> -----Original Message-----
>> From: Lee Feigenbaum [] On Behalf Of Lee
>> Feigenbaum
>> Sent: 11 May 2009 16:28
>> To: Seaborne, Andy
>> Cc: SPARQL Working Group
>> Subject: Re: ACTION-24: aggregate functions with multiple answers
>> Seaborne, Andy wrote:
>>> ACTION-24: Explain potential design regarding aggregate functions  
>>> with
>> multiple answers for mixed datatypes re ISSUE-16
>>> Aggregate functions like MIN, MAX, SUM - in fact, most except  
>>> COUNT -
>> operate on the value space of a variable binding or the value of an
>> expression, which is a value. Even COUNT of terms (COUNT(?x)) needs  
>> to
>> deal with unbound variables (by skipping?).
>>> SUM(?x) requires that ?x is numeric, presumably according to the  
>>> type
>> promotion rules for XSD arithmetic operations.
>>> What happens if SUM encounters a numeric value, such as a string or
>> date or unbound?  Because SUM works on a single value space, simply
>> ignoring nonsensical values is a possibly design.
>>> But MIN and MAX are different in that they have answers in different
>> values spaces.  MIN over numbers gives a number, MIN over dateTimes
>> gives a dateTimes etc etc.
>>> Data in RDF can be of mixed datatypes: experience with data,
>> especially combined from different sources, shows that  
>> representations
>> can vary.  In one place a dc:date property might be an XSD date, but
>> elsewhere it might be a string (all too common).
>>> If the type for the MIN operation is known by the application,  
>>> then it
>> can explicitly cast: e.g. MIN(xsd:date(?x)).  But we also need to
>> consider what happens when the application does not force the  
>> datatype.
>> MIN and MAX need to deal with incompatible data.
>>> Choices for dealing with this include:
>>> 1/ The value space for MIN is the value space of the first  
>>> encountered
>> datatype and everything incompatible is ignored.
>>> 2/ The value space has to be given - there is no single "MIN"
>> operation:
>>> e.g. MIN(xsd:dateTime, ?x)
>>> 3/ There is one answer per group for each datatype encountered in  
>>> the
>> group.  This means multiple rows per group.
>>> 4/ Error.  No query results at all.
>> Thanks, Andy. I'd like to suggest that there's a 5th option as well
>> (which is what Glitter currently does):
>> 5/ MIN and MAX are defined as per ORDER BY in the existing spec (
>> ) - for ORDER BY,  
>> the
>> spec augments the '<' operator with a relative ordering of types of  
>> RDF
>> terms. This does not provide a total ordering, and the spec.  
>> explicitly
>> says that orderings in the unspecified cases are undefined.
>> Effectively, (5) is saying to define MIN(?x) as the value of ?x in  
>> the
>> solution given by processing the group of solutions via ORDER BY  
>> ASC(?x)
>> LIMIT 1.
>> Lee
> Hi Lee,
> That can be added for URI vs literal etc.  My examples are all  
> between literals in different valuespaces where "<" can't sensible  
> extended like string and xsd:dateTime.
> What does Glitter do in this case?  Does it give an answer?  An error?
> ARQ's ORDER does in fact always impose a total ordering on ORDER BY  
> (considering the spelling of datatype IRIs and lexical forms if  
> necessary).  I don't think that is helpful here (aggregates) because  
> if a stray type occurs it can mask the expected answer.

My systems give a total ordering too. I suspect many do.

If you want to limit the value space, you can always explicitly cast.

My preferences are 5 or 4, and encouraging people to cast, like if you  
use < and co., on data found in the wild.

- Steve

Steve Harris
Garlik Limited, 2 Sheen Road, Richmond, TW9 1AE, UK
+44(0)20 8973 2465
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10  

Received on Tuesday, 12 May 2009 09:44:47 UTC