- From: Steve Harris <steve.harris@garlik.com>
- Date: Tue, 12 May 2009 10:44:08 +0100
- To: "Seaborne, Andy" <andy.seaborne@hp.com>
- Cc: Lee Feigenbaum <lee@thefigtrees.net>, SPARQL Working Group <public-rdf-dawg@w3.org>
On 11 May 2009, at 16:47, Seaborne, Andy wrote: > > >> -----Original Message----- >> From: Lee Feigenbaum [mailto:figtree@gmail.com] On Behalf Of Lee >> Feigenbaum >> Sent: 11 May 2009 16:28 >> To: Seaborne, Andy >> Cc: SPARQL Working Group >> Subject: Re: ACTION-24: aggregate functions with multiple answers >> >> Seaborne, Andy wrote: >>> ACTION-24: Explain potential design regarding aggregate functions >>> with >> multiple answers for mixed datatypes re ISSUE-16 >>> >>> >>> Aggregate functions like MIN, MAX, SUM - in fact, most except >>> COUNT - >> operate on the value space of a variable binding or the value of an >> expression, which is a value. Even COUNT of terms (COUNT(?x)) needs >> to >> deal with unbound variables (by skipping?). >>> >>> SUM(?x) requires that ?x is numeric, presumably according to the >>> type >> promotion rules for XSD arithmetic operations. >>> >>> http://www.w3.org/TR/xpath-functions/#op.numeric >>> >>> What happens if SUM encounters a numeric value, such as a string or >> date or unbound? Because SUM works on a single value space, simply >> ignoring nonsensical values is a possibly design. >>> >>> But MIN and MAX are different in that they have answers in different >> values spaces. MIN over numbers gives a number, MIN over dateTimes >> gives a dateTimes etc etc. >>> >>> Data in RDF can be of mixed datatypes: experience with data, >> especially combined from different sources, shows that >> representations >> can vary. In one place a dc:date property might be an XSD date, but >> elsewhere it might be a string (all too common). >>> >>> If the type for the MIN operation is known by the application, >>> then it >> can explicitly cast: e.g. MIN(xsd:date(?x)). But we also need to >> consider what happens when the application does not force the >> datatype. >> MIN and MAX need to deal with incompatible data. >>> >>> Choices for dealing with this include: >>> >>> 1/ The value space for MIN is the value space of the first >>> encountered >> datatype and everything incompatible is ignored. >>> >>> 2/ The value space has to be given - there is no single "MIN" >> operation: >>> e.g. MIN(xsd:dateTime, ?x) >>> >>> 3/ There is one answer per group for each datatype encountered in >>> the >> group. This means multiple rows per group. >>> >>> 4/ Error. No query results at all. >> >> Thanks, Andy. I'd like to suggest that there's a 5th option as well >> (which is what Glitter currently does): >> >> 5/ MIN and MAX are defined as per ORDER BY in the existing spec ( >> http://www.w3.org/TR/rdf-sparql-query/#modOrderBy ) - for ORDER BY, >> the >> spec augments the '<' operator with a relative ordering of types of >> RDF >> terms. This does not provide a total ordering, and the spec. >> explicitly >> says that orderings in the unspecified cases are undefined. >> >> Effectively, (5) is saying to define MIN(?x) as the value of ?x in >> the >> solution given by processing the group of solutions via ORDER BY >> ASC(?x) >> LIMIT 1. >> >> Lee > > Hi Lee, > > That can be added for URI vs literal etc. My examples are all > between literals in different valuespaces where "<" can't sensible > extended like string and xsd:dateTime. > > What does Glitter do in this case? Does it give an answer? An error? > > ARQ's ORDER does in fact always impose a total ordering on ORDER BY > (considering the spelling of datatype IRIs and lexical forms if > necessary). I don't think that is helpful here (aggregates) because > if a stray type occurs it can mask the expected answer. My systems give a total ordering too. I suspect many do. If you want to limit the value space, you can always explicitly cast. My preferences are 5 or 4, and encouraging people to cast, like if you use < and co., on data found in the wild. - Steve -- Steve Harris Garlik Limited, 2 Sheen Road, Richmond, TW9 1AE, UK +44(0)20 8973 2465 http://www.garlik.com/ Registered in England and Wales 535 7233 VAT # 849 0517 11 Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD
Received on Tuesday, 12 May 2009 09:44:47 UTC