- From: Seaborne, Andy <andy.seaborne@hp.com>
- Date: Mon, 11 May 2009 15:47:19 +0000
- To: Lee Feigenbaum <lee@thefigtrees.net>
- CC: SPARQL Working Group <public-rdf-dawg@w3.org>
> -----Original Message----- > From: Lee Feigenbaum [mailto:figtree@gmail.com] On Behalf Of Lee > Feigenbaum > Sent: 11 May 2009 16:28 > To: Seaborne, Andy > Cc: SPARQL Working Group > Subject: Re: ACTION-24: aggregate functions with multiple answers > > Seaborne, Andy wrote: > > ACTION-24: Explain potential design regarding aggregate functions with > multiple answers for mixed datatypes re ISSUE-16 > > > > > > Aggregate functions like MIN, MAX, SUM - in fact, most except COUNT - > operate on the value space of a variable binding or the value of an > expression, which is a value. Even COUNT of terms (COUNT(?x)) needs to > deal with unbound variables (by skipping?). > > > > SUM(?x) requires that ?x is numeric, presumably according to the type > promotion rules for XSD arithmetic operations. > > > > http://www.w3.org/TR/xpath-functions/#op.numeric > > > > What happens if SUM encounters a numeric value, such as a string or > date or unbound? Because SUM works on a single value space, simply > ignoring nonsensical values is a possibly design. > > > > But MIN and MAX are different in that they have answers in different > values spaces. MIN over numbers gives a number, MIN over dateTimes > gives a dateTimes etc etc. > > > > Data in RDF can be of mixed datatypes: experience with data, > especially combined from different sources, shows that representations > can vary. In one place a dc:date property might be an XSD date, but > elsewhere it might be a string (all too common). > > > > If the type for the MIN operation is known by the application, then it > can explicitly cast: e.g. MIN(xsd:date(?x)). But we also need to > consider what happens when the application does not force the datatype. > MIN and MAX need to deal with incompatible data. > > > > Choices for dealing with this include: > > > > 1/ The value space for MIN is the value space of the first encountered > datatype and everything incompatible is ignored. > > > > 2/ The value space has to be given - there is no single "MIN" > operation: > > e.g. MIN(xsd:dateTime, ?x) > > > > 3/ There is one answer per group for each datatype encountered in the > group. This means multiple rows per group. > > > > 4/ Error. No query results at all. > > Thanks, Andy. I'd like to suggest that there's a 5th option as well > (which is what Glitter currently does): > > 5/ MIN and MAX are defined as per ORDER BY in the existing spec ( > http://www.w3.org/TR/rdf-sparql-query/#modOrderBy ) - for ORDER BY, the > spec augments the '<' operator with a relative ordering of types of RDF > terms. This does not provide a total ordering, and the spec. explicitly > says that orderings in the unspecified cases are undefined. > > Effectively, (5) is saying to define MIN(?x) as the value of ?x in the > solution given by processing the group of solutions via ORDER BY ASC(?x) > LIMIT 1. > > Lee Hi Lee, That can be added for URI vs literal etc. My examples are all between literals in different valuespaces where "<" can't sensible extended like string and xsd:dateTime. What does Glitter do in this case? Does it give an answer? An error? ARQ's ORDER does in fact always impose a total ordering on ORDER BY (considering the spelling of datatype IRIs and lexical forms if necessary). I don't think that is helpful here (aggregates) because if a stray type occurs it can mask the expected answer. Andy > > > > > > Even in (3), literals of unknown (not understood by this process) > datatype, and unbounds, would be ignored. Warnings up to the > implementation but the results are the same for all processors. > > > > (1) has the unfortunate effect that the answer can change depending on > the order data is encountered, so isn't fixed even for a single query > processor. > > > > (4) is hard for scaling - the error may be encountered at the end of > the data when some results were ready much earlier but can't be sent > until the query is known to be successful. An effect of HTTP requiring > the return code first - "200 OK" is seen as promising results, not an > error half way through. > > > > Andy > > > > > > -------------------------------------------- > > Hewlett-Packard Limited > > Registered Office: Cain Road, Bracknell, Berks RG12 1HN > > Registered No: 690597 England > >
Received on Monday, 11 May 2009 15:48:30 UTC