Re: adding dawg:monotonicity and extensible data types to SPARQL query

Eric Prud'hommeaux wrote:
> On Mon, Aug 21, 2006 at 03:06:52PM +0100, Seaborne, Andy wrote:
>>
>>
>> Eric Prud'hommeaux wrote:
>>> On Mon, Aug 14, 2006 at 01:08:03PM +0200, Eric Prud'hommeaux wrote:
>>> http://www.w3.org/2001/sw/DataAccess/rq23/rq24#tests v1.14 has a new
>>> draft of the Value Testing section. This does not include the
>>> extensible datatypes support (but certainly makes it easier to add).
>>> This version is intended to include only editorial changes from the CR
>>> version.
>>>
>>>>   [DONE] ACTION: EricP to respond to PatH's new test with a proof of
>>>>   whether it's monotonic to extended datatype support [recorded in
>>>>   [25]http://www.w3.org/2006/08/08-dawg-minutes.html#action01]
>>>>   <fred> literal = literal: true or error
>>>>
>>>>   <fred> iri = iri: true or false
>>>>
>>>>   <fred> bnode = bnode: true or false
>>>>
>>>>   <fred> allother cells always false
>>>>
>>>>   2=3
>>>>
>>>>   <AndyS> Yes, Fred - that's the table I was thing of.
>>> In 1.14, I've updated RDFterm-equal to the following:
>>>
>>> http://www.w3.org/2001/sw/DataAccess/rq23/rq24#func-RDFterm-equal
>>> [[
>>> Returns TRUE if term1 and term2 are the same RDF term as defined in
>>> Resource Description Framework (RDF): Concepts and Abstract Syntax
>>> [CONCEPTS]; produces a type error if the arguments are both literal
>>> but are not the same RDF term;
>> Isn't this a bit circular as to "same RDF term"?  Something about the 
>> equality of the three parts of lexical form, datatype and lang tag (for 
>> literals) etc etc.
> 
> That comes from the following text:
> [[
> term1 and term2 are the same if any of the following is true:
> 
>     * term1 and term2 are equivalent IRIs as defined in 6.4 RDF URI
>       References.
>     * term1 and term2 are equivalent literals as defined in 6.5.1
>       Literal Equality.
>     * term1 and term2 are the same blank node as described in 6.6
>       Blank Nodes.
> ]]

Yes - agreed - I was pointing out the text, before that is worded in a 
circular fashion.

"""produces a type error if the arguments are both literal but are not the 
same RDF term;"""

Being before, it's confusing.

> 
>>> returns FALSE otherwise. term1 and
>>> term2 are the same if any of the following is true:
>>>
>>>    * term1 and term2 are equivalent IRIs as defined in 6.4 RDF URI
>>>      References.
>>>    * term1 and term2 are equivalent literals as defined in 6.5.1
>>>      Literal Equality.
>>>    * term1 and term2 are the same blank node as described in 6.6
>>>      Blank Nodes.
>>> ]]
>>>
>>> I added the "; produces a type error if the arguments are both literal
>>> but are not the same RDF term; returns FALSE otherwise" bit. The rest
>>> was already there.
>> Suggestion for a name for this : "unknown-equals" or "general-value-equals" 
>> and note that "=" may have been intercepted by a datatype specific 
>> definition of "=".
> 
> Earlier in 11.3.1 Operator Extensibility:
> [[
> Extended SPARQL implementations may support additional associations
> between operators and operator functions;
> ]]

I still suggest that the name be changed to "general-value-equals" or some 
such.  The note was to refer back to the text you quote to be clear.  I'm not 
saying the text as given was wrong, but that it could be clearer if it 
reiterated that the "=" symbol may have been overridden.

> 
> 
>> There should be text to give examples; and also for !=.
>>
>> Let's reserve "term-equals" language for a syntactic test and not having it 
>> generate an error because "term equality" suggests syntax (to me at least) 
>> without regard to value.
>>
>> An operator such as "sameTerm(?x, ?y)" would provide direct access to it 
>> (it's short hand for something like:
>>
>> ( isURI(?x) && isURI(?y) && str(?x) = str(?y) ||
>> ( isBlank(?x) && isBlank(?y) && ... same labels .... ) ||
>> ( isLiteral(?x) && isLiteral(?y) &&
>>   str(?x) = str(?y) &&
>>    (
>>      (lang(?x) = "" && lang(?y) = "" &&            # Same datatype, if any
>>         ( datatype(?x) = datatype(?y) || true )
>>    ||
>>    ( lang(?x) = lang(?y) )                         # Same lang, if any
>>    )
>> )
>>
>> The literal part is complex (and probably not correct in the above) because 
>> of lang tags and datatypes (and its asymmetric in the treatment of no lang 
>> tag and no datatype).
> 
> Ah, good. Had thought about looking for this equivalence.
> 
>> There is no way to get the label of a bNode (which is OK).
>>
>> I assume datatype("eric"@fr) is an error - I can't find anything in rq24
> 
> Hmm, I expect that presently, it's not an error; it's just not usefull
> to write it 'cause it won't match valid RDF data. I guess we've gone
> with the more restrictive approach on other counts, like literal
> subjects, so the precedent is there to say that is specifically malformed.


On IRC, Eric said that datatype() has signature typed literal and

datatype("plain literal") is an error.

RDF MT (rules xsd1a, xsd 1b) allow plain literals and xsd:strings to be used 
interchangably, making the value spaces of plain literals and xsd:strings the 
same.

> 
>>>>   <AndyS> bNode = literal (not bNode in query) may be valid
>>>>
>>>>   <AndyS> Separate sameLiteral operator.
>>>>
>>>>   <AndyS> if we want a syntactic comparision
>>>>
>>>>   <AndyS> "(x,y)"^^:geo
>>>>
>>>>   <AndyS> If you want help with this, do ask - I'm the one keen to have
>>>>   this extensibility so I feel responsible here.
>>>>
>>>>   <kendallclark> ACTION: EricP to redraft section 11 to support
>>>>   extensible datatypes [recorded in
>>>>   [18]http://www.w3.org/2006/08/08-dawg-minutes.html#action08]
>>> To this end, I propose the following addendum to the derived types list:
>>> [[
>>> Extended SPARQL implementations may treat additional types as being
>>> derived from numeric types.
>>> ]]
>> There is no need to restrict things to numerics.  Any new value space is 
>> possible.  Examples:
>>
>> 1/ xsd:dates
>> 2/ Things with units.
>>    For a sufficiently knowledgeable processor:
>>    "273"^^:kelvin should not compare with "273^^xsd:integer [*]
>>    "273"^^:kelvin should compare with "+273^^:kelvin
>>    "275"^^:kelvin should compare with "2^^:centigrade
>>
>> [*] Let's not confuse record temperature as a number, and recording it as a 
>> unit datatype.  :kelvin(273) would be needed.
> 
> The text, in context, does not limit SPARQL implementations to
> extending numeric datatypes. (This is the pain of not just committing
> the text and having people look at it in situ.) Because numerics have
> a prescribed hierarchy in SPARQL, I needed to enumerate the minimally
> supported numeric data types. The above addendum points out that
> implementations may add to that list; meaning, respect the subtype
> substitution rules even with regards to the extended types.
> 
> Adding support for kelvin, date or other primitive data types would be
> addresed by adding new associations between operators and operator
> functions.
> 
> One piece missing in the puzzle is the subtypes of these types. This
> is, I believe, best addressed by altering 11.3 ¶2 to not scope the
> subtype substitution to numerics:
> [[
> SPARQL follows XPath's scheme for type promotions and subtype
> substitution for arguments to operators (see XML Path Language (XPath)
> 2.0 [XPATH20] for defintions of numeric type promotions and subtype
> substitution). The XPath Operator Mapping rules for numeric operands
> {xs:integer, xs:decimal, xs:float, xs:double, and types derived from a
> numeric type} apply to SPARQL operators as well. Some of the operators
> are associated with nested function expressions,
> e.g. fn:not(op:numeric-equal(A, B)). Note that per the XPath
> definitions, fn:not and op:numeric-equal produce an error if their
> argument is an error.
> ]]
> PROPOSED: to adopt the above text. An illustrative example is:
> 
> Type Extensions:
>   <xs:schema
>      targetNamespace="http://example.com/wannadate"
>      xmlns:xs="http://www.w3.org/2001/XMLSchema">
>     <xs:simpleType id="Canonical21stCenturyDate">
>       <xs:restriction base="xs:date">
>         <xs:pattern value="2[0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]-"/>
>       </xs:restriction>
>     </xs:simpleType>
>   </xs:schema>
> 
> Data:
>   @prefix dt:     <http://example.com/wannadate> .
>   @prefix meeting: <http://example.com/meeting#> .
>   meeting:m1 meeting:date "2005-02-03"^^dt:Canonical21stCenturyDate .
>   meeting:m2 meeting:date "2006-02-01"^^dt:Canonical21stCenturyDate .
> 
> Query:
>   PREFIX dt: <http://example.com/wannadate>
>   PREFIX meeting: <http://example.com/meeting#>
>   SELECT ?m2
>    WHERE { ?m1 meeting:date ?m1Date .
>            ?m2 meeting:date ?m2Date
>            FILTER ( ?m1Date > m2Date ) }
> 
> Unextended Result:
>   +-----+
>   | ?m2 |
>   +-----+
>   +-----+
> 
> Extended Result:
>   +------------+
>   |    ?m2     |
>   +------------+
>   | meeting:m2 |
>   +------------+
> 
>>> and a new minor section following the operator table:
>>> [[
>>> 11.3.1 Operator Extensibility
>>>
>>> Extended SPARQL implementations may support additional associations
>>> between operators and operator functions; this amounts to adding rows
>>> to the table above. No additional operator support may yield a result
>>> that replaces any result other than a type error in an unextended
>>> implementation. The consequence of this rule is that extended SPARQL
>>> implementations will produce at least the same solutions as an
>>> unextended implementation, and may, for some queries, produce more
>>> solutions.
>>> ]]
>> The text "and may, for some queries, produce more solutions" won't be true 
>> because we have logical not.
> 
> Can you find a counter example?

The problem is using the word "query", not restricted to "expressions".  The 
usual OPTIONAL/BOUND trick is always going to provide loopholes because it's 
outside a FILTER.

Data:
:x :p "45"^^:dtype .

Query:
ASK { OPTIONAL { :x :p ?v . FILTER ( ?v < "67"^^:dtype ) }
       FILTER (bound(?v))



I've just noticed that "45"^^:dtype <= "45"^^:dtype is error by your proposed 
design but can be true by explictily having value-compare as I described.  I 
take back my comment that I thought your proposal was the same.

>>> I think this behaves exactly as sop:value-compare would.
>>>
>>>
>>> Cost:
>>>
>>> Is the cost of using the same operator for value comparison and symbol
>>> comparison less than that of making users use a different operator for
>>> RDFterm-equal? I think it's a matter of taste. The wierd case in this
>>> solution is that you can't negate a syntactic literal equivilence
>>> test.
>> This isn't symbol comparison any more because the backstop "=" does not 
>> work on all symbol combinations (unknown datatypes, different lexical 
>> forms).
> 
> Again, I have to ask for a counter example.

My point was that the app may want access to the syntactic form, even when it 
there is a value form (e.g. validation).  Overtaken by your other email so 
I'll reply to that.

	Andy

> 
>>> Data:
>>>  <x> <p> "II"^^roman:numeral .
>>>
>>> Query1:
>>>  ASK { ?x ?p ?v
>>>        FILTER (?v = "IV"^^roman:numeral) }
>>> Result1: no
>>>
>>> Query1:
>>>  ASK { ?x ?p ?v
>>>        FILTER (?v != "IV"^^roman:numeral) }
>>> Result1: no
>>>
>>> Of course, and extended SPARQL implementation may give you a yes for
>>> the latter but the issue that will make users cock their heads shows
>>> up in the unextended implementation.
>> That's inevitable with monotonicity + extensible datatypes + ASK masking 
>> error vs false.  And that's OK.
>>
>> I still think explicitly talking about value spaces (a paragraph) will make 
>> it clearer.  Then say "=" etc works on same-value space pairs.
>>
>> If you want, I'll write this text.
> 
> That would probably help. In every concept I've had for it, the extra
> level of indirection wasn't helpful.

Received on Tuesday, 22 August 2006 10:27:19 UTC