- From: Andy Seaborne <andy.seaborne@epimorphics.com>
- Date: Wed, 01 Dec 2010 22:30:15 +0000
- To: SPARQL Working Group <public-rdf-dawg@w3.org>
This message is the details of executing on the WG decisions from http://www.w3.org/2009/sparql/meeting/2010-11-30 It would be good if these details get reviewed but if I hear nothing, this is the approach I'll take when I write up the content (that won't be immediately). Suggestion: change the name from LENGTH to STRLEN because "LENGTH" might imply RDF lists, or paths of Seq. Suggestion: change the name from SUBSTRING to SUBSTR just to make it shorter, and 'STR' is used for strings in SPARQL elesewhere. Details of string operations: STRLEN(string) SUBSTR(string, int, int) UCASE(string) LCASE(string) ENDS(string, string) STARTS(string, string) CONTAINS(string, string) ENCODES(string) CONCAT(string*) Issues to sort out are around different flavo(u)rs of string. Unlike F&O we have 3 string forms: xsd:string, simple literal (the SPARQL term for a plain literal without a language tag) and plain literals with language tag ("LitLang", from now on). Design: 1/ Operations cover simple literal, LitLang, xsd:string. This makes it a good thing we have our own IRIs - the F&O operations only cover xsd:string. 2/ The return type will be the form of the principle argument. principle argument means the one the operation is acting on. So Operations on xsd:string yield xsd:string Operations on LitLang yield @lang but not with mixing of @tags Operations on simple literal yield simple literals 3/ Mixing different language tags do not match or compare Note that "Script" and "dialect" are parts of a language tag. STRLEN(string) -> integer SUBSTR(string, int) -> string SUBSTR(string, int, int) -> string Design-2 applies. The first argument is the "principle argument" Caution: F&O is 1-based indexing, + length Warning to Java programmers and others, it's not [start,end) UCASE(string) LCASE(string) Design-2 applies. UCASE("abc") -> ""ABC" UCASE("abc"@de) -> ""ABC"@de UCASE("abc"^^xsd:string) -> ""ABC"^^xsd:string ENDS(string, string) STARTS(string, string) CONTAINS(string, string) STARTS("abc", "a") -> true STARTS("abc"@en, "a"@en) -> true STARTS("abc"@en, "a"@en-UK) -> false *** (could be error) Must be same language tag if two language tags present (else false or error) NB: This works: STARTS(str(?uri), str(prefix:)) ENCODES(string) Result is a simple literal regardless of string. string can be simple, or xsd:string Not clear to me it should apply to LitLang proposal: it does not (it is an error). CONCAT(string*) If all the strings are simple literals -> simple literals If the strings are a mix of simple literals and one or more xsd:string -> xsd:string If the strings are a mix of simple literals, xsd:strings and LitLang, and the lang tags are all the same -> plain literal with that language tag. If the strings are a mix of simple literals and plain literals and there are two or more different language tags -> simple literal NB: CONCAT("abc"@en, "def"@en-UK) -> "abcdef" because it has different language tags. If the strings are a mix of simple literals, xsd:strings and LitLang and there are two or more different language tags -> xsd:string CONCAT("abc"@en, "def"@en-UK, "z"^^xsd:string) -> "abcdefz" Other types (including IRIs) do not get cast to string. Add STR() or xsd:string() as needed. This is a choice point - as there are two choices for the cast STR() and xsd:string() if it were implicit, I suggest we require explicit casts. Andy
Received on Wednesday, 1 December 2010 22:30:52 UTC