Re: Details of string operations from Andy Seaborne on 2010-12-02 (public-rdf-dawg@w3.org from October to December 2010)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Thu, 02 Dec 2010 11:56:51 +0000
To: Steve Harris <steve.harris@garlik.com>
CC: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <4CF78983.1060309@epimorphics.com>

Summary: outstanding:

Is a mix of simple literals and XSD strings on CONCAT going to return a 
simple literal or xsd:string?

Example:

CONCAT(?var1, " -> ", ?var2)

?var1 and/or ?var2 are xsd:strings. is the result a simple literal or 
xsd:string?

Discussion below.

	Andy

On 02/12/10 10:58, Steve Harris wrote:
>> ENCODES(string)
>
> ENCODES() strikes me as strange naming, sounds like a predicate.

As Greg pointed out, ENCODE.

> There are many URI encodings people might reasonably want, "full" URI (encodeURI() in Javascript 1.5), URI component (encodeURIComponent() in Javascript 1.5), plus there's also base64 etc. encoding.

This is specifically a counterpart to fn:encode-for-uri i.e. %-encoding.

The "for URI" is significant - it's not applying different rules to 
different parts of the string (e.g. hostnames)

> Prefer naming like ENCODE_URI(), ENCODE_URI_COMPONENT(), learning from Javascript's mistake.

I have no particular opinion but it's more like the latter (which is a 
tad long).

ENCODE_FOR_URI?

> Would also like DECODE_* forms.
>
> [ I'd like MD5_HEX() and SHA1_HEX() too, returning hex encoded simple literals, very useful when minting stable identifiers, but not going to fight for it ]

Can we make that a separate issue?  I was considering the resolution of 
the WG fro the last WG.  Someone (Paul?) took an action to write to the 
list about it.

>> CONCAT(string*)

>> UCASE(string)
>> LCASE(string)
>>   Design-2 applies.
>>   UCASE("abc") ->  ""ABC"
>>   UCASE("abc"@de) ->  ""ABC"@de
>>   UCASE("abc"^^xsd:string) ->  ""ABC"^^xsd:string
>
> Just to note, supporting this for the bulk of unicode is quite a heavy requirement. I do think we should have it though.

Yes - it is amusing isn't it :-)

>> ENCODES(string)
>>   Result is a simple literal regardless of string.
>>   string can be simple, or xsd:string
>>   Not clear to me it should apply to LitLang
>>     proposal: it does not (it is an error).
>
> This could potentially cause some confusion if it's the only stringy function that will give an error when given a literal with lang tag.
>
> I've also seen plenty of documents in the wild with<rdf:RDF xml:lang="en">, so all literals ended up with language tags by default, regardless of whether that makes any sense. People might reasonably want to do:
>
> URI(CONCAT(STR(prefix:), ENCODE_URI_COMPONENT(?code)))

URI(CONCAT(STR(prefix:), ENCODE_URI_COMPONENT(STR(?code))))

>
> and will be surprised when ?code ->  "Zm9vCg=="@en causes that result to be dropped.

Fair point but, in this operation, the return type is not going to be 
lang tagged whereas it maybe elsewhere.  i.e. it has different 
requirements anyway.

>> CONCAT(string*)
>>
>> If all the strings are simple literals
>>    ->  simple literals
>>
>> If the strings are a mix of simple literals and one or more xsd:string
>>    ->  xsd:string
>
> For commonality with the rules below, this might be better returning a simple literal.

No strong opinion here but there is a reason:

My thinking was that if there is an xsd:string from the data, but the 
query writes a simple literal (convenience) then the result is typed.

e.g. CONCAT(?var1, " -> ", ?var2)

and ?var1 and ?var2 are xsd:strings from the data.

What do others think?

>> CONCAT("abc"@en, "def"@en-UK, "z"^^xsd:string) ->  "abcdefz"

Should have been "abcdefz"^^xsd:string.

	Andy

Received on Thursday, 2 December 2010 11:57:31 UTC