W3C home > Mailing lists > Public > public-rdf-dawg-comments@w3.org > November 2009

SPARQL 1.1 Aggregates

From: Toby Inkster <tai@g5n.co.uk>
Date: Mon, 30 Nov 2009 09:46:24 +0000
To: public-rdf-dawg-comments@w3.org
Message-ID: <1259574385.2511.500.camel@ophelia2.g5n.co.uk>
I see that the built-in set of aggregates for SPARQL 1.1 has not yet
been decided.

The current list is quite numerically oriented. Here are some I'd like
to see:

	CONCAT - concatenates values, with an optional second
		parameter to provide a joiner character. Result
		is a plain literal with no language.

	XML_CONCAT - Concatenates values into an XMLLiteral
		using an SPARQL-Results-like structure.

	LONGEST/SHORTEST - returns the longest or shortest
		result (in terms of character count). Optional
		second parameter specifies a language.

	MODE/MEDIAN - while AVG returns the mean result, these
		two would return other kinds of average. With
		named graphs, the same triple can occur
		multiple times, so MODE makes sense. Optional
		second parameter specifies a language.

In the case where I've indicated that the second parameter specifies a
language, the aggregate function would work like this:

	1. Do any values in the list match the specified language?
		(Using same definition of "match" as langMatches.)
		If so, then discard any results which don't match.

	2. Run the aggregate as normal.

So for example, on the following graph:

	<http://example.com/cat>
		rdfs:label "cat"@en, "chat"@fr, "feline"@en, "felis"@la.

This SPARQL query:

	SELECT ?resource (SHORTEST(?label,"fr") AS ?mylabel)
	WHERE { ?resource rdfs:label ?label . }

Would return:

	resource                 | mylabel
	-------------------------+-----------
	<http://example.com/cat> | "chat"@fr

Because the non-French values would be discarded, with the shortest
remaining label being selected. However, this:

	SELECT ?resource (SHORTEST(?label,"de") AS ?mylabel)
	WHERE { ?resource rdfs:label ?label . }

Would return

	resource                 | mylabel
	-------------------------+-----------
	<http://example.com/cat> | "cat"@en

There was no German label in the data, so the discarding step never
happens - thus the shortest of any language is selected.

I think in terms of presenting views of graph data, having these
aggregate language preferences (and they're preferences, not filters, as
the second example illustrates) would be very useful - especially for
"label" and "description" kinds of fields.

While I'm giving examples, I'll provide some for CONCAT and XML_CONCAT:

	SELECT
		?resource
		(CONCAT(?label, ";") AS ?concat)
		(XML_CONCAT(?label) AS ?xmlconcat)
	WHERE { ?resource rdfs:label ?label . }
	ORDER BY ?label

?concat would be "cat;chat;feline;felis" (the ORDER BY clause having
been used by the aggregate function). ?xmlconcat would be:

"""<literal xml:lang="en">cat</literal>
<literal xml:lang="fr">chat</literal>
<literal xml:lang="en">feline</literal>
<literal xml:lang="la">felis</literal>"""^^rdf:XMLLiteral 

Perhaps the data type could be more specialised - instead of
rdf:XMLLiteral, it could be, say, sparql:XMLResultsLiteral, which SPARQL
libraries could recognise and automagically parse for you.

-- 
Toby A Inkster
<mailto:mail@tobyinkster.co.uk>
<http://tobyinkster.co.uk>
Received on Monday, 30 November 2009 09:47:14 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 30 November 2009 09:47:14 GMT