SPARQL 1.1 Aggregates from Toby Inkster on 2009-11-30 (public-rdf-dawg-comments@w3.org from November 2009)

From: Toby Inkster <tai@g5n.co.uk>
Date: Mon, 30 Nov 2009 09:46:24 +0000
To: public-rdf-dawg-comments@w3.org
Message-ID: <1259574385.2511.500.camel@ophelia2.g5n.co.uk>
I see that the built-in set of aggregates for SPARQL 1.1 has not yet
been decided.

The current list is quite numerically oriented. Here are some I'd like
to see:

 CONCAT - concatenates values, with an optional second
  parameter to provide a joiner character. Result
  is a plain literal with no language.

 XML_CONCAT - Concatenates values into an XMLLiteral
  using an SPARQL-Results-like structure.

 LONGEST/SHORTEST - returns the longest or shortest
  result (in terms of character count). Optional
  second parameter specifies a language.

 MODE/MEDIAN - while AVG returns the mean result, these
  two would return other kinds of average. With
  named graphs, the same triple can occur
  multiple times, so MODE makes sense. Optional
  second parameter specifies a language.

In the case where I've indicated that the second parameter specifies a
language, the aggregate function would work like this:

 1. Do any values in the list match the specified language?
  (Using same definition of "match" as langMatches.)
  If so, then discard any results which don't match.

 2. Run the aggregate as normal.

So for example, on the following graph:

 <http://example.com/cat>
  rdfs:label "cat"@en, "chat"@fr, "feline"@en, "felis"@la.

This SPARQL query:

 SELECT ?resource (SHORTEST(?label,"fr") AS ?mylabel)
 WHERE { ?resource rdfs:label ?label . }

Would return:

 resource                 | mylabel
 -------------------------+-----------
 <http://example.com/cat> | "chat"@fr

Because the non-French values would be discarded, with the shortest
remaining label being selected. However, this:

 SELECT ?resource (SHORTEST(?label,"de") AS ?mylabel)
 WHERE { ?resource rdfs:label ?label . }

Would return

 resource                 | mylabel
 -------------------------+-----------
 <http://example.com/cat> | "cat"@en

There was no German label in the data, so the discarding step never
happens - thus the shortest of any language is selected.

I think in terms of presenting views of graph data, having these
aggregate language preferences (and they're preferences, not filters, as
the second example illustrates) would be very useful - especially for
"label" and "description" kinds of fields.

While I'm giving examples, I'll provide some for CONCAT and XML_CONCAT:

 SELECT
  ?resource
  (CONCAT(?label, ";") AS ?concat)
  (XML_CONCAT(?label) AS ?xmlconcat)
 WHERE { ?resource rdfs:label ?label . }
 ORDER BY ?label

?concat would be "cat;chat;feline;felis" (the ORDER BY clause having
been used by the aggregate function). ?xmlconcat would be:

"""<literal xml:lang="en">cat</literal>
<literal xml:lang="fr">chat</literal>
<literal xml:lang="en">feline</literal>
<literal xml:lang="la">felis</literal>"""^^rdf:XMLLiteral 

Perhaps the data type could be more specialised - instead of
rdf:XMLLiteral, it could be, say, sparql:XMLResultsLiteral, which SPARQL
libraries could recognise and automagically parse for you.

-- 
Toby A Inkster
<mailto:mail@tobyinkster.co.uk>
<http://tobyinkster.co.uk>
Received on Monday, 30 November 2009 09:47:14 UTC