User-Defined SPARQL Functions (with JavaScript) from Holger Knublauch on 2009-03-16 (public-rdf-dawg-comments@w3.org from March 2009)

From: Holger Knublauch <holger@topquadrant.com>
Date: Mon, 16 Mar 2009 14:29:18 -0700
To: public-rdf-dawg-comments@w3.org
Message-Id: <7853BEB5-9C5B-4E6B-B65C-EDFADFE6112C@topquadrant.com>
Lee has asked [1] for suggestions on how to make SPARQL more  
extensible. I agree with his observation that instead of having the WG  
create dozens of specific extensions, it would be more fruitful if the  
WG would simply define extension points that allow anyone to  
contribute those extensions themselves - assuming this happens in a  
platform-independent and transparent way.

SPARQL functions (in FILTERs and assignments) already have a URI, but  
as far as I know SPARQL does not yet define a mechanism to resolve  
this URI to get some executable code. As part of the SPIN framework  
[2] we have implemented a simple mechanism in which the function's URI  
points to RDF metadata that instruct the SPARQL engine on how to  
execute the function. One way of using SPIN is to point to another  
"nested" SPARQL query template that will be executed as a function's  
body. Alternatively, in the new TopBraid Composer 3.0 beta release, we  
are introducing a capability to use JavaScript code as the body of  
such functions. I have copied my corresponding blog entry [3] under  
the references below.

Regards,
Holger

[1] http://www.thefigtrees.net/lee/blog/2009/03/evolving_standards_consensus_a.html
[2] http://spinrdf.org
[3] http://composing-the-semantic-web.blogspot.com/2009/03/using-javascript-to-define-sparql.html

---

Many real-world SPARQL queries make heavy use of built-in functions  
for tasks such as string processing and mathematical calculations.  
SPARQL comes with a pre-defined set of such built-in functions.  
However, in practice, these built-in libraries are frequently extended  
to solve specific problems that have not been anticipated by the  
language designers. Such extensions are typically implemented natively  
for a specific SPARQL execution engine, for example in Java. Needless  
to say, this is not a solution in the spirit of the (Semantic) Web,  
because it leads to a Tower of Babel with all kinds of dialects and  
platform-specific extensions.

We have proposed SPIN Functions as one possible extension mechanism  
for SPARQL, that allows anyone to derive new SPARQL functions by  
combining other SPARQL functions and query templates. In general, SPIN  
functions are Semantic Web resources that can be referenced by their  
URI to get a description of the function's arguments, return value and  
executable body. However, even this approach does not cover all  
possible use cases, because it is still limited by the lower-level  
SPARQL operations and functions. Many problems can only be solved with  
a general-purpose programming language.

TopBraid 3.0.0 beta 2 now introduces an extension of the SPIN  
functions mechanism that can be used to define new SPARQL functions  
with the help of JavaScript. In a nutshell, a SPINx function is a SPIN  
function that points to a snippet of JavaScript, or a JavaScript file,  
which will be executed whenever the function is run. The arguments of  
the SPARQL function call are made available as arguments to the  
corresponding JavaScript function. Here is a simple example:
     ex:square
       a spin:Function ;
       rdfs:subClassOf spin:Functions ;
       spin:constraint
           [ a spl:Argument ;
             rdfs:comment "The value to compute the square of" ;
             spl:predicate sp:arg1 ;
             spl:valueType xsd:float
           ] ;
       spin:returnType xsd:float ;
       spinx:javaScriptCode "return arg1 * arg1;" .
The function above can be called such as LET (?sq := ex:square(4)) to  
calculate the square value of the argument. In addition to having  
inline code via the spinx:javaScriptCode property, there is an option  
to simply link to a .js file that contains a function with the same  
name.

This simple approach (a variation of which had been proposed byGregory  
Williams) greatly extends the expressive power of SPARQL through a  
relatively platform independent mechanism - JavaScript interpreters  
such as Mozilla Rhino are widely available on all major platforms. We  
have selected JavaScript for three major reasons: First, JavaScript  
(known as ECMAScript) is a well-established interpretable language  
that many users are already familar with. Second, JavaScript is web- 
friendly: you can reference scripts via URLs and scripts may import  
other scripts from the web as well. Finally, JavaScript has an  
attractive security model that will make it more difficult to create  
malicious SPARQL extensions (of course, details would need to be  
fleshed out).

The expressivity of JavaScript is great for many problems that are  
beyond the capabilities of SPARQL and its built-in features. You can  
freely express if-then-else conditions, loops, sub-procedure calls  
etc, and thus gradually build up your own function libraries in a  
portable fashion. A limitation right now is that the JavaScript-based  
SPARQL functions (in TopBraid) do not have any mechanism yet to access  
the current RDF graph at execution time. This would be needed to  
implement things like walking an rdf:List or computing average values  
etc; in short any use case that requires more background knowledge  
than what has been explicitly passed into the function as arguments.  
However, this is limitation can be fixed, for example by defining a  
small collection of built-in call back functions such as find(S,P,O)  
in JavaScript. There is a large body of related work on JavaScript RDF  
APIs that may also be leveraged for that purpose.
Received on Monday, 16 March 2009 21:30:08 UTC