Re: [TF-LIB] Finalizing built-ins from Paul Gearon on 2010-02-23 (public-rdf-dawg@w3.org from January to March 2010)

From: Paul Gearon <gearon@ieee.org>
Date: Tue, 23 Feb 2010 11:49:01 -0500
To: Steve Harris <steve.harris@garlik.com>
Cc: Andy Seaborne <andy.seaborne@talis.com>, SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <a25ac1f1002230849q6f3ca942g4c9ab9a16818819e@mail.gmail.com>
Responding to both Andy and Steve at the same time...

On Tue, Feb 23, 2010 at 8:33 AM, Steve Harris <steve.harris@garlik.com> wrote:
> On 23 Feb 2010, at 13:14, Andy Seaborne wrote:
>
>> In an effort to progress the syntax issues ...
>>
>> http://www.w3.org/2009/sparq/wiki/Design:FunctionLibrary#SPARQL_specific
>>
>> We have some built-in functions proposed:
>>
>> ** BNODE() -> fresh blank node every call.
>>
>> ** BNODE(string) -> same blank node as other use of BNODE(string)
>>
>>  Scope is per binding (row) so
>>  BNODE("a") is like _:a in CONSTRUCT templates.
>>
>> Do we also way bnodes scoped to the whole result set?
>
> Ah, maybe, I think I'd thought that's what this function did. Result-set
> scoped bNodes are not something you can mint currently.

I may be lost here, so I'll ask for clarification.

What do BNODE() and BNODE(str) do exactly? It's not all that useful in
a WHERE clause, since it's creating a bnode that won't match anything,
so I mostly see blank nodes in CONSTRUCT. We already have _:label and
[], so what do BNODE() and BNODE(str) do that these other
representations do not?

> Related: do we have any consensus around a skolemisation function?
> SKOLEMISE(?bnode) -> URI.
> I'm not sure how many people want / could support such a thing.

This would be easy to do (at least for me), but for what purpose? I
agree with Andy's later suggestion of using something like
urn:x-bnode:label (though I liked the notion of "purloining" the urn
scheme :-)

The only use case I can think of for skolemising would be if a
de-skolemising function were also available. However, that would be
highly application specific and non-portable. This would break all
sorts of notions of what blank nodes are, but OTOH, it's one of the
most common feature requests I see (one that I've resisted until now).

To summarize, I don't see a use case for skolemising on its own, so I
wouldn't support it. As for skolemise/deskolemize, I'm against it from
a modeling perspective, but I also appreciate that it solves a lot of
practical problems for people, so I'd be willing to support it if it
had general support.

>> Replace the meaning of BNODE("label") or have another form?
>
> We maybe need both forms. Axel has some usecase IIRC?
>
>> ** LITERAL(str) ->
>>
>>  This is a restricted STR(x)
>>  I propose dropping LITERAL/1

How is it restricted? It's the same, isn't it?

>> ** LITERAL(str, IRI)
>>
>> This is a dynamic cast - currently, casts are done as function calls of
>> one argument so the datatype IRI is fixed at parse time.
>>
>> This would be possible:
>> ?str = "IV"
>> ?dt  = my:RomanNumeral
>>
>> then a call of
>> LITERAL(?str , ?dt) ==> "IV"^^my:RomanNumeral
>>
>> This one is a special case of calling a function dynamically (the IRI is
>> not known at parse/compile time).

I'm happy with this, since it's easy to construct these things as you
go. Would the syntax specify that the second parameter has to be a
variable? Or will a literal IRI for the datatype be an exact
equivalent of the datatype cast?

>> We could support dynamic function call. Possibilities include:
>>
>>  ?function(?arg1, ?arg2, ....)
>>  CALL(?function, ?arg1, ?arg2, ....)
>
> Getting perl flashbacks. And not in a good way :)

lol.

The first one looks nicer, but I like the explicitness of the CALL(...) syntax.

> I can imagine this playing havoc with peoples extension function
> implementations, and security models.

I'm not sure how the first version would play out, but the second one
should be fine. In my case I'd do it by nesting a dynamically created
function call in a static one.

Security models are a different matter, but I think it comes down to
what you allow your sever to load up (unless you want to allow users
to define functions using a script found in a literal!). In my case,
all functions are found in jar files that are provided when the server
is started up in a JVM. Anyone who has access to set up these jars
already sufficient access to make security at this level irrelevant.

If you want different users to have different access to various
functions, then that will have to be managed beyond the SPARQL spec
(as it currently stands).

>> Then LITERAL(str, IRI) is not needed:
>>
>>  ?dt(?str)
>>
>> and it follows the form of:
>>
>>  xsd:integer(?str2)
>
> We'd still need a function for language tags though, right? Could be a 2-arg
> form of STR() I guess.
>
>> ** LITERAL(str, string) -> literal with language tag
>>
>> This is the one remaining literal constructor.  I don't have a good name
>> for it.  Even if we have LITERAL(?str, ?datatype) I think we need a
>> different name because two nearly identical functions are just switching on
>> the type of their second argument. I don't see a UC to make that useful - I
>> do see possible confusion as people do mix strings and IRIs.

It's also confusing since it would get dynamically bound. Most
languages don't do that, so it would be unexpected.

>> We already have LANG(?lit) built-in meaning get the language tag.
>>
>> Suggestions for a name?
>>
>>  LANG(?str, ?langTag)

No, since the function name implies that you're retrieving the
language tag, not generating a literal.

>>  LITERAL(?str, ?langTag)

Not keen on this because of the dynamic binding implication.

>>  other?
> Prefer STRDT(?str, ?dt) and STRLANG(?str, ?lang) or something similar, for
> consistency. Agreed that making this polymorphic is maybe not a good idea.
> No strong feelings though.


I like STRLANG(?str, ?lang) since it's short (and nicer than
LITERALLANG or LITERAL_LANG).

I don't like STRDT since you're not creating a string (unless your
data type is xsd:string, of course). OTOH, STRLANG is always going to
create a string, which is why I like STR in the name.

>> is possible.  My mild pref is LANG/2.
>>
>> ** IN
>>
>> I suggest IN(arg,....) and NOT IN(arg,....)
>
> Full syntax would be "val [NOT] IN (arg,...)" - right? IN (...) on it's own
> would not be legal.

That's how I understand it.

>> ** IF
>> Yes
>
> +1, if it's a ternary IF(condition, value-if-true, value-if-false) type if.

+1 to that too.

>> ** COALESCE
>> Yes
>
> +1

I haven't been looking at aggregates yet (though it looks like they'll
be built on our current aggregates, so "whew"). So I'll abstain for
the time being.

>> BETWEEN has not had much support.
>
> -1 to BETWEEN

I like it, but I'm OK if it's not there. We currently search for paris
of greater-than and less-than and convert them into a single operation
anyway.

>> If this is OK, some TF-LIB tasks remaining are:
>>
>> 1/ Make URIs for all built-ins
>> 2/ Choose subset of F&O as the common library.
>> 3/ Test cases
>> 4/ Document built-ins in (what was) sec 11.
>>
>> Is it worth splitting out sec 11 which is quite large already?
>
> Splitting out to?

While one huge doc is not very nice to deal with, the spec is already
split up into several docs. This is a pain from an implementation
point of view. I'd prefer to keep this with the original, unless it
really gets out of control.

Regards,
Paul Gearon
Received on Tuesday, 23 February 2010 16:49:37 UTC