Re: adding dawg:monotonicity and extensible data types to SPARQL query

On Mon, Aug 21, 2006 at 03:06:52PM +0100, Seaborne, Andy wrote:
> 
> 
> 
> Eric Prud'hommeaux wrote:
> >On Mon, Aug 14, 2006 at 01:08:03PM +0200, Eric Prud'hommeaux wrote:
> >http://www.w3.org/2001/sw/DataAccess/rq23/rq24#tests v1.14 has a new
> >draft of the Value Testing section. This does not include the
> >extensible datatypes support (but certainly makes it easier to add).
> >This version is intended to include only editorial changes from the CR
> >version.
> >
> >>   [DONE] ACTION: EricP to respond to PatH's new test with a proof of
> >>   whether it's monotonic to extended datatype support [recorded in
> >>   [25]http://www.w3.org/2006/08/08-dawg-minutes.html#action01]
> >
> >>   <fred> literal = literal: true or error
> >>
> >>   <fred> iri = iri: true or false
> >>
> >>   <fred> bnode = bnode: true or false
> >>
> >>   <fred> allother cells always false
> >>
> >>   2=3
> >>
> >>   <AndyS> Yes, Fred - that's the table I was thing of.
> >
> >In 1.14, I've updated RDFterm-equal to the following:
> >
> >http://www.w3.org/2001/sw/DataAccess/rq23/rq24#func-RDFterm-equal
> >[[
> >Returns TRUE if term1 and term2 are the same RDF term as defined in
> >Resource Description Framework (RDF): Concepts and Abstract Syntax
> >[CONCEPTS]; produces a type error if the arguments are both literal
> >but are not the same RDF term;
> 
> Isn't this a bit circular as to "same RDF term"?  Something about the 
> equality of the three parts of lexical form, datatype and lang tag (for 
> literals) etc etc.

That comes from the following text:
[[
term1 and term2 are the same if any of the following is true:

    * term1 and term2 are equivalent IRIs as defined in 6.4 RDF URI
      References.
    * term1 and term2 are equivalent literals as defined in 6.5.1
      Literal Equality.
    * term1 and term2 are the same blank node as described in 6.6
      Blank Nodes.
]]

> >returns FALSE otherwise. term1 and
> >term2 are the same if any of the following is true:
> >
> >    * term1 and term2 are equivalent IRIs as defined in 6.4 RDF URI
> >      References.
> >    * term1 and term2 are equivalent literals as defined in 6.5.1
> >      Literal Equality.
> >    * term1 and term2 are the same blank node as described in 6.6
> >      Blank Nodes.
> >]]
> >
> >I added the "; produces a type error if the arguments are both literal
> >but are not the same RDF term; returns FALSE otherwise" bit. The rest
> >was already there.
> 
> Suggestion for a name for this : "unknown-equals" or "general-value-equals" 
> and note that "=" may have been intercepted by a datatype specific 
> definition of "=".

Earlier in 11.3.1 Operator Extensibility:
[[
Extended SPARQL implementations may support additional associations
between operators and operator functions;
]]


> There should be text to give examples; and also for !=.
> 
> Let's reserve "term-equals" language for a syntactic test and not having it 
> generate an error because "term equality" suggests syntax (to me at least) 
> without regard to value.
> 
> An operator such as "sameTerm(?x, ?y)" would provide direct access to it 
> (it's short hand for something like:
> 
> ( isURI(?x) && isURI(?y) && str(?x) = str(?y) ||
> ( isBlank(?x) && isBlank(?y) && ... same labels .... ) ||
> ( isLiteral(?x) && isLiteral(?y) &&
>   str(?x) = str(?y) &&
>    (
>      (lang(?x) = "" && lang(?y) = "" &&            # Same datatype, if any
>         ( datatype(?x) = datatype(?y) || true )
>    ||
>    ( lang(?x) = lang(?y) )                         # Same lang, if any
>    )
> )
> 
> The literal part is complex (and probably not correct in the above) because 
> of lang tags and datatypes (and its asymmetric in the treatment of no lang 
> tag and no datatype).

Ah, good. Had thought about looking for this equivalence.

> There is no way to get the label of a bNode (which is OK).
> 
> I assume datatype("eric"@fr) is an error - I can't find anything in rq24

Hmm, I expect that presently, it's not an error; it's just not usefull
to write it 'cause it won't match valid RDF data. I guess we've gone
with the more restrictive approach on other counts, like literal
subjects, so the precedent is there to say that is specifically malformed.

> >>   <AndyS> bNode = literal (not bNode in query) may be valid
> >>
> >>   <AndyS> Separate sameLiteral operator.
> >>
> >>   <AndyS> if we want a syntactic comparision
> >>
> >>   <AndyS> "(x,y)"^^:geo
> >>
> >>   <AndyS> If you want help with this, do ask - I'm the one keen to have
> >>   this extensibility so I feel responsible here.
> >>
> >>   <kendallclark> ACTION: EricP to redraft section 11 to support
> >>   extensible datatypes [recorded in
> >>   [18]http://www.w3.org/2006/08/08-dawg-minutes.html#action08]
> >
> >To this end, I propose the following addendum to the derived types list:
> >[[
> >Extended SPARQL implementations may treat additional types as being
> >derived from numeric types.
> >]]
> 
> There is no need to restrict things to numerics.  Any new value space is 
> possible.  Examples:
> 
> 1/ xsd:dates
> 2/ Things with units.
>    For a sufficiently knowledgeable processor:
>    "273"^^:kelvin should not compare with "273^^xsd:integer [*]
>    "273"^^:kelvin should compare with "+273^^:kelvin
>    "275"^^:kelvin should compare with "2^^:centigrade
> 
> [*] Let's not confuse record temperature as a number, and recording it as a 
> unit datatype.  :kelvin(273) would be needed.

The text, in context, does not limit SPARQL implementations to
extending numeric datatypes. (This is the pain of not just committing
the text and having people look at it in situ.) Because numerics have
a prescribed hierarchy in SPARQL, I needed to enumerate the minimally
supported numeric data types. The above addendum points out that
implementations may add to that list; meaning, respect the subtype
substitution rules even with regards to the extended types.

Adding support for kelvin, date or other primitive data types would be
addresed by adding new associations between operators and operator
functions.

One piece missing in the puzzle is the subtypes of these types. This
is, I believe, best addressed by altering 11.3 ¶2 to not scope the
subtype substitution to numerics:
[[
SPARQL follows XPath's scheme for type promotions and subtype
substitution for arguments to operators (see XML Path Language (XPath)
2.0 [XPATH20] for defintions of numeric type promotions and subtype
substitution). The XPath Operator Mapping rules for numeric operands
{xs:integer, xs:decimal, xs:float, xs:double, and types derived from a
numeric type} apply to SPARQL operators as well. Some of the operators
are associated with nested function expressions,
e.g. fn:not(op:numeric-equal(A, B)). Note that per the XPath
definitions, fn:not and op:numeric-equal produce an error if their
argument is an error.
]]
PROPOSED: to adopt the above text. An illustrative example is:

Type Extensions:
  <xs:schema
     targetNamespace="http://example.com/wannadate"
     xmlns:xs="http://www.w3.org/2001/XMLSchema">
    <xs:simpleType id="Canonical21stCenturyDate">
      <xs:restriction base="xs:date">
        <xs:pattern value="2[0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]-"/>
      </xs:restriction>
    </xs:simpleType>
  </xs:schema>

Data:
  @prefix dt:     <http://example.com/wannadate> .
  @prefix meeting: <http://example.com/meeting#> .
  meeting:m1 meeting:date "2005-02-03"^^dt:Canonical21stCenturyDate .
  meeting:m2 meeting:date "2006-02-01"^^dt:Canonical21stCenturyDate .

Query:
  PREFIX dt: <http://example.com/wannadate>
  PREFIX meeting: <http://example.com/meeting#>
  SELECT ?m2
   WHERE { ?m1 meeting:date ?m1Date .
           ?m2 meeting:date ?m2Date
           FILTER ( ?m1Date > m2Date ) }

Unextended Result:
  +-----+
  | ?m2 |
  +-----+
  +-----+

Extended Result:
  +------------+
  |    ?m2     |
  +------------+
  | meeting:m2 |
  +------------+

> >
> >and a new minor section following the operator table:
> >[[
> >11.3.1 Operator Extensibility
> >
> >Extended SPARQL implementations may support additional associations
> >between operators and operator functions; this amounts to adding rows
> >to the table above. No additional operator support may yield a result
> >that replaces any result other than a type error in an unextended
> >implementation. The consequence of this rule is that extended SPARQL
> >implementations will produce at least the same solutions as an
> >unextended implementation, and may, for some queries, produce more
> >solutions.
> >]]
> 
> The text "and may, for some queries, produce more solutions" won't be true 
> because we have logical not.

Can you find a counter example?

> >
> >I think this behaves exactly as sop:value-compare would.
> >
> >
> >Cost:
> >
> >Is the cost of using the same operator for value comparison and symbol
> >comparison less than that of making users use a different operator for
> >RDFterm-equal? I think it's a matter of taste. The wierd case in this
> >solution is that you can't negate a syntactic literal equivilence
> >test.
> 
> This isn't symbol comparison any more because the backstop "=" does not 
> work on all symbol combinations (unknown datatypes, different lexical 
> forms).

Again, I have to ask for a counter example.

> >
> >Data:
> >  <x> <p> "II"^^roman:numeral .
> >
> >Query1:
> >  ASK { ?x ?p ?v
> >        FILTER (?v = "IV"^^roman:numeral) }
> >Result1: no
> >
> >Query1:
> >  ASK { ?x ?p ?v
> >        FILTER (?v != "IV"^^roman:numeral) }
> >Result1: no
> >
> >Of course, and extended SPARQL implementation may give you a yes for
> >the latter but the issue that will make users cock their heads shows
> >up in the unextended implementation.
> 
> That's inevitable with monotonicity + extensible datatypes + ASK masking 
> error vs false.  And that's OK.
> 
> >
> 
> I still think explicitly talking about value spaces (a paragraph) will make 
> it clearer.  Then say "=" etc works on same-value space pairs.
> 
> If you want, I'll write this text.

That would probably help. In every concept I've had for it, the extra
level of indirection wasn't helpful.
-- 
-eric

home-office: +1.617.395.1213 (usually 900-2300 CET)
	    +33.1.45.35.62.14
cell:       +33.6.73.84.87.26

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.

Received on Monday, 21 August 2006 21:40:45 UTC