- From: Andy Seaborne <andy.seaborne@epimorphics.com>
- Date: Thu, 02 Dec 2010 11:56:51 +0000
- To: Steve Harris <steve.harris@garlik.com>
- CC: SPARQL Working Group <public-rdf-dawg@w3.org>
Summary: outstanding: Is a mix of simple literals and XSD strings on CONCAT going to return a simple literal or xsd:string? Example: CONCAT(?var1, " -> ", ?var2) ?var1 and/or ?var2 are xsd:strings. is the result a simple literal or xsd:string? Discussion below. Andy On 02/12/10 10:58, Steve Harris wrote: >> ENCODES(string) > > ENCODES() strikes me as strange naming, sounds like a predicate. As Greg pointed out, ENCODE. > There are many URI encodings people might reasonably want, "full" URI (encodeURI() in Javascript 1.5), URI component (encodeURIComponent() in Javascript 1.5), plus there's also base64 etc. encoding. This is specifically a counterpart to fn:encode-for-uri i.e. %-encoding. The "for URI" is significant - it's not applying different rules to different parts of the string (e.g. hostnames) > Prefer naming like ENCODE_URI(), ENCODE_URI_COMPONENT(), learning from Javascript's mistake. I have no particular opinion but it's more like the latter (which is a tad long). ENCODE_FOR_URI? > Would also like DECODE_* forms. > > [ I'd like MD5_HEX() and SHA1_HEX() too, returning hex encoded simple literals, very useful when minting stable identifiers, but not going to fight for it ] Can we make that a separate issue? I was considering the resolution of the WG fro the last WG. Someone (Paul?) took an action to write to the list about it. >> CONCAT(string*) >> UCASE(string) >> LCASE(string) >> Design-2 applies. >> UCASE("abc") -> ""ABC" >> UCASE("abc"@de) -> ""ABC"@de >> UCASE("abc"^^xsd:string) -> ""ABC"^^xsd:string > > Just to note, supporting this for the bulk of unicode is quite a heavy requirement. I do think we should have it though. Yes - it is amusing isn't it :-) >> ENCODES(string) >> Result is a simple literal regardless of string. >> string can be simple, or xsd:string >> Not clear to me it should apply to LitLang >> proposal: it does not (it is an error). > > This could potentially cause some confusion if it's the only stringy function that will give an error when given a literal with lang tag. > > I've also seen plenty of documents in the wild with<rdf:RDF xml:lang="en">, so all literals ended up with language tags by default, regardless of whether that makes any sense. People might reasonably want to do: > > URI(CONCAT(STR(prefix:), ENCODE_URI_COMPONENT(?code))) URI(CONCAT(STR(prefix:), ENCODE_URI_COMPONENT(STR(?code)))) > > and will be surprised when ?code -> "Zm9vCg=="@en causes that result to be dropped. Fair point but, in this operation, the return type is not going to be lang tagged whereas it maybe elsewhere. i.e. it has different requirements anyway. >> CONCAT(string*) >> >> If all the strings are simple literals >> -> simple literals >> >> If the strings are a mix of simple literals and one or more xsd:string >> -> xsd:string > > For commonality with the rules below, this might be better returning a simple literal. No strong opinion here but there is a reason: My thinking was that if there is an xsd:string from the data, but the query writes a simple literal (convenience) then the result is typed. e.g. CONCAT(?var1, " -> ", ?var2) and ?var1 and ?var2 are xsd:strings from the data. What do others think? >> CONCAT("abc"@en, "def"@en-UK, "z"^^xsd:string) -> "abcdefz" Should have been "abcdefz"^^xsd:string. Andy
Received on Thursday, 2 December 2010 11:57:31 UTC