- From: Seaborne, Andy <andy.seaborne@hp.com>
- Date: Mon, 23 Oct 2006 15:24:59 +0100
- To: Eric Prud'hommeaux <eric@w3.org>
- CC: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Eric Prud'hommeaux wrote: > On Sat, Oct 21, 2006 at 05:53:29PM +0100, Seaborne, Andy wrote: >> >> >> Eric Prud'hommeaux wrote: >>> On Thu, Aug 24, 2006 at 09:45:33PM +0100, Seaborne, Andy wrote: >>>> """ >>>> ACTION AndyS: >>>> Write some tests for value testing (unknown types and extensibility) to >>>> add to >>>> 2006/JulSep0086 >>>> """ >>>> >>>> http://lists.w3.org/Archives/Public/public-rdf-dawg/2006JulSep/0086 >>>> http://lists.w3.org/Archives/Public/public-rdf-dawg/2006AprJun/0104 >>>> >> . . . >> >> >>>> Tests open-eq-07 to open-eq-10 work by taking a list of all possible term >>>> forms, forming the cross product and seeing which are value-equal and >>>> value-not-equal. This is done for data which contains the same compared >>>> values and different by comparable values. These tests are exhaustive and >>>> include literals with lang tags - because lang tags are not case >>>> sensitive (nor is there a canonical form according to RFC3066) it seemed >>>> reasonable to be able equate "xyz"@EN with "xyz"@en. In effect, each lang >>>> tag defines a separate value space - can't compare or test for equality >>>> across them, but you can with the same language. >>>> >>>> "abc"@en = "abc"@EN >>>> "xyz"@en > "abc"@en >>>> "xyz"@en > "abc"@EN > > This creates the interesting conundrum that something is > simultaneously equivilent and greaterThan: > "abc"@en = "abc"@EN ⇒ TRUE > "abc"@en > "abc"@EN ⇒ TRUE > (and "abc"@EN < "abc"@en ⇒ TRUE) Don't understand. How can "abc"@en > "abc"@EN be true? > > I would favor < over =, but I guess that depends on your use cases. > >>> There is no current language for case-insensitive language tags in >>> SPARQL presently. My implementation failed these both because of >>> case-sensitive language matching, and because they employed extra >>> operators not currently in SPARQL. >> Is is just a matter of expanding the table to include RDF plain literals >> with language tags? ORDER BY defers to "<" if it can. > > I think "abc"@en > "abc"@EN is fully expressible with our current > functions: > > (STR(?a) != STR(?b) && STR(?a) > STR(?b)) > || > (STR(?a) == STR(?b) && LANG(?a) > LANG(?b)) # isn't "a" > "A" wierd? I'm not proposing any ordering across language tags. I am proposing "xyz"@en < "abc"@fr is an error. Can't compare across language tags. > > If the above analysis is correct, one could add a shortcut syntax for > in the operator mapping table. (note: simple literal > simple literal > is currently in the table.): > > [[ > ┃A > B│simple literal│simple literal│op:numeric-equal(fn:compare(A, B), 1) │xsd:boolean┃ > + ┃A > B│plain literal │plain literal │logical-or( > logical-and(fn:not(op:numeric-equal(fn:compare(str(A), str(B)), 0)), > op:numeric-equal(fn:compare(lang(A), lang(B)), 1)), > logical-and(op:numeric-equal(fn:compare(str(A), str(B)), 0), > op:numeric-equal(fn:compare(str(A), str(B)), 1)))│xsd:boolean┃ > ]] Something like that if lang(A) = lang(B) needs to be case insensitive. > or one could add functions for each of < > <= >= ala: > [[ > + ┃A > B│plain literal │plain literal │RDFplainLiteral-greaterThan(A, B))│xsd:boolean┃ > > RDFplainLiteral-greaterThan > xsd:boolean RDFplainLiteral-greaterThan (plain literal lit1, plain literal lit2) > > If the lexical values of lit1 and lit2 are identical, > RDFplainLiteral-greaterThan TRUE or FALSE depending whether > LANG(lit1) > LANG(lit2). If the lexical values are not identical, > RDFplainLiteral-greaterThan TRUE or FALSE depending whether > STR(lit1) > STR(lit2). > ]] > > These specifications were assuming that you wanted this sort order: > "abb" > "abc" > "abc"@EN > "abc"@eN > "abc"@En > "abc"@en > "abc"@en-fr # zis iss how we speak here > "abd" Persomally, I woudl not worry about ordering of lang tags - a system may have lost the original form. But codepoint is the most natural. > >> I tried writing things out from the current operations alone: >> >> Some things can be written: >> ( lang(?x) = lang(?y) ) && str(?x) > str(?y) >> but that only works cleanly for the same language tag - different would >> cause >> false, not error which seems more natural and it's verbose. >> >> langMatches isn't symmetric but I think: >> >> langMatches(lang(?x),lang(?y)) && >> langMatches(lang(?y),lang(?x)) && >> str(?x) > str(?y) >> >> attempts to handle the case-sensitivity issue because a language tag is a >> special case of a language range. It becomes more verbose though - ugh. >> Or a regex. > > REGEXP(LANG(?x), LANG(?y), 'i') > >> "11.3.1 Operator Extensibility" could explicitly cover this - I can accept >> that language tag handling is an extension if there is text that states >> that. So far we have really been thinking of extension by datatypes. > > [[ > Extended SPARQL implementations may support additional associations > between operators and operator functions; this amounts to adding rows > to the table above. No additional operator support may yield a result > that replaces any result other than a type error in an unextended > implementation. > ]] > I think I've convinced myself that it's extendable this way. You > are adding rows that replace the type errors you would get in an > unextended implementation. > > These rules just make sure that you don't lose dawg:monotinicity over > DAWG-specified parts of the language. Ideally, people won't step on > each other's truth values too much, but I don't think we can say much > about that. Specifically mentioning lang tags would be useful because they aren't datatypes. [[ The consequence of this rule is that extended SPARQL implementations will produce at least the same solutions as an unextended implementation, and may, for some queries, produce more solutions. ]] isn't true by the way - filters can be negated so more or less solutions are going to be possible with any kind of extensibility. That's why "!=" should mean "not known to be unequal" and not "not(known to be equal)" Andy
Received on Monday, 23 October 2006 14:25:48 UTC