Re: Language tags and valu etetsing

On Mon, Oct 23, 2006 at 03:24:59PM +0100, Seaborne, Andy wrote:
> 
> 
> 
> Eric Prud'hommeaux wrote:
> >On Sat, Oct 21, 2006 at 05:53:29PM +0100, Seaborne, Andy wrote:
> >>
> >>
> >>Eric Prud'hommeaux wrote:
> >>>On Thu, Aug 24, 2006 at 09:45:33PM +0100, Seaborne, Andy wrote:
> >>>>"""
> >>>>ACTION AndyS:
> >>>>Write some tests for value testing (unknown types and extensibility) to 
> >>>>add to
> >>>>2006/JulSep0086
> >>>>"""
> >>>>
> >>>>http://lists.w3.org/Archives/Public/public-rdf-dawg/2006JulSep/0086
> >>>>http://lists.w3.org/Archives/Public/public-rdf-dawg/2006AprJun/0104
> >>>>
> >>. . .
> >>
> >>
> >>>>Tests open-eq-07 to open-eq-10 work by taking a list of all possible 
> >>>>term
> >>>>forms, forming the cross product and seeing which are value-equal and
> >>>>value-not-equal.  This is done for data which contains the same compared
> >>>>values and different by comparable values.  These tests are exhaustive 
> >>>>and
> >>>>include literals with lang tags - because lang tags are not case 
> >>>>sensitive (nor is there a canonical form according to RFC3066) it 
> >>>>seemed reasonable to be able equate "xyz"@EN with "xyz"@en. In effect, 
> >>>>each lang tag defines a separate value space - can't compare or test 
> >>>>for equality across them, but you can with the same language.
> >>>>
> >>>>"abc"@en = "abc"@EN
> >>>>"xyz"@en > "abc"@en
> >>>>"xyz"@en > "abc"@EN
> >
> >This creates the interesting conundrum that something is
> >simultaneously equivilent and greaterThan:
> >     "abc"@en = "abc"@EN ⇒ TRUE
> >     "abc"@en > "abc"@EN ⇒ TRUE
> >(and "abc"@EN < "abc"@en ⇒ TRUE)
> 
> Don't understand.  How can "abc"@en > "abc"@EN be true?
> 
> >
> >I would favor < over =, but I guess that depends on your use cases.
> >
> >>>There is no current language for case-insensitive language tags in
> >>>SPARQL presently. My implementation failed these both because of
> >>>case-sensitive language matching, and because they employed extra
> >>>operators not currently in SPARQL.
> >>Is is just a matter of expanding the table to include RDF plain literals 
> >>with language tags? ORDER BY defers to "<" if it can.
> >
> >I think "abc"@en > "abc"@EN is fully expressible with our current
> >functions:
> >
> >  (STR(?a) != STR(?b) && STR(?a) > STR(?b))
> >    || 
> >  (STR(?a) == STR(?b) && LANG(?a) > LANG(?b))  # isn't "a" > "A" wierd?
> 
> 
> I'm not proposing any ordering across language tags.
> 
> I am proposing "xyz"@en < "abc"@fr is an error.  Can't compare across 
> language tags.
> 

Oops, most of the mappings in my mail are apparently irrelevent to
your proposed functionality.

> >
> >If the above analysis is correct, one could add a shortcut syntax for
> >in the operator mapping table. (note: simple literal > simple literal
> >is currently in the table.):
> >
> >[[
> >  ┃A > B│simple literal│simple 
> >  literal│op:numeric-equal(fn:compare(A, B), 1)                 
> >  │xsd:boolean┃
> >+ ┃A > B│plain literal │plain literal │logical-or(
> >                                         logical-and(fn:not(op:numeric-equal(fn:compare(str(A), str(B)), 0)), 
> >                                            op:numeric-equal(fn:compare(lang(A), lang(B)), 1)), 
> >                                         logical-and(op:numeric-equal(fn:compare(str(A), str(B)), 0), 
> >                                            op:numeric-equal(fn:compare(str(A), str(B)), 1)))│xsd:boolean┃
> >]]
> Something like that if lang(A) = lang(B) needs to be case insensitive.
> 
> >or one could add functions for each of < > <= >= ala:
> >[[
> >+ ┃A > B│plain literal │plain literal 
> >│RDFplainLiteral-greaterThan(A, B))│xsd:boolean┃
> >
> >RDFplainLiteral-greaterThan
> >  xsd:boolean   RDFplainLiteral-greaterThan (plain literal lit1, plain 
> >  literal lit2)
> >
> >If the lexical values of lit1 and lit2 are identical,
> >RDFplainLiteral-greaterThan TRUE or FALSE depending whether
> >LANG(lit1) > LANG(lit2). If the lexical values are not identical,
> >RDFplainLiteral-greaterThan TRUE or FALSE depending whether
> >STR(lit1) > STR(lit2).
> >]]
> >
> >These specifications were assuming that you wanted this sort order:
> >     "abb"
> >     "abc"
> >     "abc"@EN
> >     "abc"@eN
> >     "abc"@En
> >     "abc"@en
> >     "abc"@en-fr # zis iss how we speak here
> >     "abd"
> 
> Persomally, I woudl not worry about ordering of lang tags - a system may 
> have lost the original form.  But codepoint is the most natural.
> 
> >
> >>I tried writing things out from the current operations alone:
> >>
> >>Some things can be written:
> >>  ( lang(?x) = lang(?y) ) && str(?x) > str(?y)
> >>but that only works cleanly for the same language tag - different would 
> >>cause
> >>false, not error which seems more natural and it's verbose.
> >>
> >>langMatches isn't symmetric but I think:
> >>
> >>  langMatches(lang(?x),lang(?y)) &&
> >>  langMatches(lang(?y),lang(?x)) &&
> >>  str(?x) > str(?y)
> >>
> >>attempts to handle the case-sensitivity issue because a language tag is a 
> >>special case of a language range.  It becomes more verbose though - ugh.  
> >>Or a regex.
> >
> >    REGEXP(LANG(?x), LANG(?y), 'i')
> >
> >>"11.3.1 Operator Extensibility" could explicitly cover this - I can 
> >>accept that language tag handling is an extension if there is text that 
> >>states that. So far we have really been thinking of extension by 
> >>datatypes.
> >
> >[[
> >Extended SPARQL implementations may support additional associations
> >between operators and operator functions; this amounts to adding rows
> >to the table above. No additional operator support may yield a result
> >that replaces any result other than a type error in an unextended
> >implementation.
> >]]
> >I think I've convinced myself that it's extendable this way. You
> >are adding rows that replace the type errors you would get in an
> >unextended implementation.
> >
> >These rules just make sure that you don't lose dawg:monotinicity over
> >DAWG-specified parts of the language. Ideally, people won't step on
> >each other's truth values too much, but I don't think we can say much
> >about that.
> 
> Specifically mentioning lang tags would be useful because they aren't 
> datatypes.
> 
> [[
> The consequence of this rule is that extended SPARQL implementations will 
> produce at least the same solutions as an unextended implementation, and 
> may, for some queries, produce more solutions.
> ]]
> isn't true by the way - filters can be negated so more or less solutions 
> are going to be possible with any kind of extensibility.

Exploring the results for a date comparison with or without negation,
with or without support for xsd:date for this tests with only one of
the FILTERs applied:

Data:
<http://example.org/doc1> dc:date "2001-12-03T13:41"^^xsd:dateTime .
<http://example.org/doc2> dc:date "2003-10-03"^^xsd:date .

Query:
SELECT ?doc
 WHERE { ?doc dc:date ?d
         FILTER (?d > "2000-00-00T12:00"^^xsd:dateTime)    # t - TRUE
         FILTER (!(?d < "2000-00-00T12:00"^^xsd:dateTime)) # tn - TRUE-negated
         FILTER (?d < "2000-00-00T12:00"^^xsd:dateTime)    # f - FALSE
         FILTER (!(?d > "2000-00-00T12:00"^^xsd:dateTime)) # fn - FALSE-negated
       }

 without xsd:date  with xsd:date support
t <http://example.org/doc1> <http://example.org/doc1>
     <http://example.org/doc2>
tn <http://example.org/doc1> <http://example.org/doc1>
     <http://example.org/doc2>
f
fn

That is, f and fn get no results.
Can you come up with a case that meets our notion of monotinicity that
violates this "produce at least the same solutions" assertion?

> That's why "!=" should mean "not known to be unequal" and not "not(known to 
> be equal)"

This is a separate issue, I believe. In XQuery, you are assumed to be
doing something wrong if you compare anything outside the standard
mapping, even to find out it is not equal. In SPARQL, we have to
decide if we want that same functionality, or change RDFterm-equal [TEQ]
[[
Returns TRUE if term1 and term2 are the same RDF term as defined in
Resource Description Framework (RDF): Concepts and Abstract Syntax
[CONCEPTS]; produces a type error if the arguments are both literal
but are not the same RDF term; returns FALSE otherwise. term1 and
term2 are the same if any of the following is true:
]]
to
[[
Returns TRUE if term1 and term2 are the same RDF term as defined in
Resource Description Framework (RDF): Concepts and Abstract Syntax
[CONCEPTS]; produces a type error if either of the arguments is a
literal but is not a supported term; returns FALSE otherwise. term1
and term2 are the same if any of the following is true: 
]]
to add = and != tests for all the supported types.

[TEQ] http://www.w3.org/2001/sw/DataAccess/rq23/rq24#func-RDFterm-equal
-- 
-eric

home-office: +1.617.395.1213 (usually 900-2300 CET)
     +33.1.45.35.62.14
cell:       +33.6.73.84.87.26

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.

Received on Monday, 23 October 2006 15:38:46 UTC