Re: adding dawg:monotonicity and extensible data types to SPARQL query

On Tue, Aug 22, 2006 at 11:25:33AM +0100, Seaborne, Andy wrote:
> 
> 
> 
> Eric Prud'hommeaux wrote:
> >On Mon, Aug 21, 2006 at 03:06:52PM +0100, Seaborne, Andy wrote:
> >>
> >>
> >>Eric Prud'hommeaux wrote:
> >>>On Mon, Aug 14, 2006 at 01:08:03PM +0200, Eric Prud'hommeaux wrote:
> >>>http://www.w3.org/2001/sw/DataAccess/rq23/rq24#tests v1.14 has a new
> >>>draft of the Value Testing section. This does not include the
> >>>extensible datatypes support (but certainly makes it easier to add).
> >>>This version is intended to include only editorial changes from the CR
> >>>version.
> >>>
> >>>>  [DONE] ACTION: EricP to respond to PatH's new test with a proof of
> >>>>  whether it's monotonic to extended datatype support [recorded in
> >>>>  [25]http://www.w3.org/2006/08/08-dawg-minutes.html#action01]
> >>>>  <fred> literal = literal: true or error
> >>>>
> >>>>  <fred> iri = iri: true or false
> >>>>
> >>>>  <fred> bnode = bnode: true or false
> >>>>
> >>>>  <fred> allother cells always false
> >>>>
> >>>>  2=3
> >>>>
> >>>>  <AndyS> Yes, Fred - that's the table I was thing of.
> >>>In 1.14, I've updated RDFterm-equal to the following:
> >>>
> >>>http://www.w3.org/2001/sw/DataAccess/rq23/rq24#func-RDFterm-equal
> >>>[[
> >>>Returns TRUE if term1 and term2 are the same RDF term as defined in
> >>>Resource Description Framework (RDF): Concepts and Abstract Syntax
> >>>[CONCEPTS]; produces a type error if the arguments are both literal
> >>>but are not the same RDF term;
> >>Isn't this a bit circular as to "same RDF term"?  Something about the 
> >>equality of the three parts of lexical form, datatype and lang tag (for 
> >>literals) etc etc.
> >
> >That comes from the following text:
> >[[
> >term1 and term2 are the same if any of the following is true:
> >
> >    * term1 and term2 are equivalent IRIs as defined in 6.4 RDF URI
> >      References.
> >    * term1 and term2 are equivalent literals as defined in 6.5.1
> >      Literal Equality.
> >    * term1 and term2 are the same blank node as described in 6.6
> >      Blank Nodes.
> >]]
> 
> Yes - agreed - I was pointing out the text, before that is worded in a 
> circular fashion.
> 
> """produces a type error if the arguments are both literal but are not the 
> same RDF term;"""
> 
> Being before, it's confusing.

The very first phrase in the description of RDFterm-equal is
[[
Returns TRUE if term1 and term2 are the same RDF term as defined in
Resource Description Framework (RDF): Concepts and Abstract Syntax
[CONCEPTS]
]]

> 
> >
> >>>returns FALSE otherwise. term1 and
> >>>term2 are the same if any of the following is true:
> >>>
> >>>   * term1 and term2 are equivalent IRIs as defined in 6.4 RDF URI
> >>>     References.
> >>>   * term1 and term2 are equivalent literals as defined in 6.5.1
> >>>     Literal Equality.
> >>>   * term1 and term2 are the same blank node as described in 6.6
> >>>     Blank Nodes.
> >>>]]
> >>>
> >>>I added the "; produces a type error if the arguments are both literal
> >>>but are not the same RDF term; returns FALSE otherwise" bit. The rest
> >>>was already there.
> >>Suggestion for a name for this : "unknown-equals" or 
> >>"general-value-equals" and note that "=" may have been intercepted by a 
> >>datatype specific definition of "=".
> >
> >Earlier in 11.3.1 Operator Extensibility:
> >[[
> >Extended SPARQL implementations may support additional associations
> >between operators and operator functions;
> >]]
> 
> I still suggest that the name be changed to "general-value-equals" or some 
> such.  The note was to refer back to the text you quote to be clear.  I'm 
> not saying the text as given was wrong, but that it could be clearer if it 
> reiterated that the "=" symbol may have been overridden.
> 
> >
> >
> >>There should be text to give examples; and also for !=.
> >>
> >>Let's reserve "term-equals" language for a syntactic test and not having 
> >>it generate an error because "term equality" suggests syntax (to me at 
> >>least) without regard to value.
> >>
> >>An operator such as "sameTerm(?x, ?y)" would provide direct access to it 
> >>(it's short hand for something like:
> >>
> >>( isURI(?x) && isURI(?y) && str(?x) = str(?y) ||
> >>( isBlank(?x) && isBlank(?y) && ... same labels .... ) ||
> >>( isLiteral(?x) && isLiteral(?y) &&
> >>  str(?x) = str(?y) &&
> >>   (
> >>     (lang(?x) = "" && lang(?y) = "" &&            # Same datatype, if any
> >>        ( datatype(?x) = datatype(?y) || true )
> >>   ||
> >>   ( lang(?x) = lang(?y) )                         # Same lang, if any
> >>   )
> >>)
> >>
> >>The literal part is complex (and probably not correct in the above) 
> >>because of lang tags and datatypes (and its asymmetric in the treatment 
> >>of no lang tag and no datatype).
> >
> >Ah, good. Had thought about looking for this equivalence.
> >
> >>There is no way to get the label of a bNode (which is OK).
> >>
> >>I assume datatype("eric"@fr) is an error - I can't find anything in rq24
> >
> >Hmm, I expect that presently, it's not an error; it's just not usefull
> >to write it 'cause it won't match valid RDF data. I guess we've gone
> >with the more restrictive approach on other counts, like literal
> >subjects, so the precedent is there to say that is specifically malformed.
> 
> 
> On IRC, Eric said that datatype() has signature typed literal and
> 
> datatype("plain literal") is an error.
> 
> RDF MT (rules xsd1a, xsd 1b) allow plain literals and xsd:strings to be 
> used interchangably, making the value spaces of plain literals and 
> xsd:strings the same.
> 
> >
> >>>>  <AndyS> bNode = literal (not bNode in query) may be valid
> >>>>
> >>>>  <AndyS> Separate sameLiteral operator.
> >>>>
> >>>>  <AndyS> if we want a syntactic comparision
> >>>>
> >>>>  <AndyS> "(x,y)"^^:geo
> >>>>
> >>>>  <AndyS> If you want help with this, do ask - I'm the one keen to have
> >>>>  this extensibility so I feel responsible here.
> >>>>
> >>>>  <kendallclark> ACTION: EricP to redraft section 11 to support
> >>>>  extensible datatypes [recorded in
> >>>>  [18]http://www.w3.org/2006/08/08-dawg-minutes.html#action08]
> >>>To this end, I propose the following addendum to the derived types list:
> >>>[[
> >>>Extended SPARQL implementations may treat additional types as being
> >>>derived from numeric types.
> >>>]]
> >>There is no need to restrict things to numerics.  Any new value space is 
> >>possible.  Examples:
> >>
> >>1/ xsd:dates
> >>2/ Things with units.
> >>   For a sufficiently knowledgeable processor:
> >>   "273"^^:kelvin should not compare with "273^^xsd:integer [*]
> >>   "273"^^:kelvin should compare with "+273^^:kelvin
> >>   "275"^^:kelvin should compare with "2^^:centigrade
> >>
> >>[*] Let's not confuse record temperature as a number, and recording it as 
> >>a unit datatype.  :kelvin(273) would be needed.
> >
> >The text, in context, does not limit SPARQL implementations to
> >extending numeric datatypes. (This is the pain of not just committing
> >the text and having people look at it in situ.) Because numerics have
> >a prescribed hierarchy in SPARQL, I needed to enumerate the minimally
> >supported numeric data types. The above addendum points out that
> >implementations may add to that list; meaning, respect the subtype
> >substitution rules even with regards to the extended types.
> >
> >Adding support for kelvin, date or other primitive data types would be
> >addresed by adding new associations between operators and operator
> >functions.
> >
> >One piece missing in the puzzle is the subtypes of these types. This
> >is, I believe, best addressed by altering 11.3 ¶2 to not scope the
> >subtype substitution to numerics:
> >[[
> >SPARQL follows XPath's scheme for type promotions and subtype
> >substitution for arguments to operators (see XML Path Language (XPath)
> >2.0 [XPATH20] for defintions of numeric type promotions and subtype
> >substitution). The XPath Operator Mapping rules for numeric operands
> >{xs:integer, xs:decimal, xs:float, xs:double, and types derived from a
> >numeric type} apply to SPARQL operators as well. Some of the operators
> >are associated with nested function expressions,
> >e.g. fn:not(op:numeric-equal(A, B)). Note that per the XPath
> >definitions, fn:not and op:numeric-equal produce an error if their
> >argument is an error.
> >]]
> >PROPOSED: to adopt the above text. An illustrative example is:
> >
> >Type Extensions:
> >  <xs:schema
> >     targetNamespace="http://example.com/wannadate"
> >     xmlns:xs="http://www.w3.org/2001/XMLSchema">
> >    <xs:simpleType id="Canonical21stCenturyDate">
> >      <xs:restriction base="xs:date">
> >        <xs:pattern value="2[0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]-"/>
> >      </xs:restriction>
> >    </xs:simpleType>
> >  </xs:schema>
> >
> >Data:
> >  @prefix dt:     <http://example.com/wannadate> .
> >  @prefix meeting: <http://example.com/meeting#> .
> >  meeting:m1 meeting:date "2005-02-03"^^dt:Canonical21stCenturyDate .
> >  meeting:m2 meeting:date "2006-02-01"^^dt:Canonical21stCenturyDate .
> >
> >Query:
> >  PREFIX dt: <http://example.com/wannadate>
> >  PREFIX meeting: <http://example.com/meeting#>
> >  SELECT ?m2
> >   WHERE { ?m1 meeting:date ?m1Date .
> >           ?m2 meeting:date ?m2Date
> >           FILTER ( ?m1Date > m2Date ) }
> >
> >Unextended Result:
> >  +-----+
> >  | ?m2 |
> >  +-----+
> >  +-----+
> >
> >Extended Result:
> >  +------------+
> >  |    ?m2     |
> >  +------------+
> >  | meeting:m2 |
> >  +------------+
> >
> >>>and a new minor section following the operator table:
> >>>[[
> >>>11.3.1 Operator Extensibility
> >>>
> >>>Extended SPARQL implementations may support additional associations
> >>>between operators and operator functions; this amounts to adding rows
> >>>to the table above. No additional operator support may yield a result
> >>>that replaces any result other than a type error in an unextended
> >>>implementation. The consequence of this rule is that extended SPARQL
> >>>implementations will produce at least the same solutions as an
> >>>unextended implementation, and may, for some queries, produce more
> >>>solutions.
> >>>]]
> >>The text "and may, for some queries, produce more solutions" won't be 
> >>true because we have logical not.
> >
> >Can you find a counter example?
> 
> The problem is using the word "query", not restricted to "expressions".  
> The usual OPTIONAL/BOUND trick is always going to provide loopholes because 
> it's outside a FILTER.
> 
> Data:
> :x :p "45"^^:dtype .
> 
> Query:
> ASK { OPTIONAL { :x :p ?v . FILTER ( ?v < "67"^^:dtype ) }
>       FILTER (bound(?v))

Unextended Result:
  +----+
  | no |
  +----+

Type Extensions:
  <xs:schema
     targetNamespace="http://example.com/oddness"
     xmlns:xs="http://www.w3.org/2001/XMLSchema">
    <xs:simpleType id="dtype">
      <xs:restriction base="xs:decimal">
        <xs:pattern value="[\-+]?[0-9]*[02468]"/>
      </xs:restriction>
    </xs:simpleType>
  </xs:schema>

Extended Result:
  +-----+
  | yes |
  +-----+

I don't see this as a counter example.

> I've just noticed that "45"^^:dtype <= "45"^^:dtype is error by your 
> proposed design but can be true by explictily having value-compare as I 
> described.  I take back my comment that I thought your proposal was the 
> same.

The WG truth table and discussion did not take into account anything
about the decomposition of <= into < and =. We can take up adding
that, if you like, but I don't see where my proposed implementation of
the WG discussion differs in any way from the semantics discussed
during the telecon.

> >>>I think this behaves exactly as sop:value-compare would.
> >>>
> >>>
> >>>Cost:
> >>>
> >>>Is the cost of using the same operator for value comparison and symbol
> >>>comparison less than that of making users use a different operator for
> >>>RDFterm-equal? I think it's a matter of taste. The wierd case in this
> >>>solution is that you can't negate a syntactic literal equivilence
> >>>test.
> >>This isn't symbol comparison any more because the backstop "=" does not 
> >>work on all symbol combinations (unknown datatypes, different lexical 
> >>forms).
> >
> >Again, I have to ask for a counter example.
> 
> My point was that the app may want access to the syntactic form, even when 
> it there is a value form (e.g. validation).  Overtaken by your other email 
> so I'll reply to that.
> 
> 	Andy
> 
> >
> >>>Data:
> >>> <x> <p> "II"^^roman:numeral .
> >>>
> >>>Query1:
> >>> ASK { ?x ?p ?v
> >>>       FILTER (?v = "IV"^^roman:numeral) }
> >>>Result1: no
> >>>
> >>>Query1:
> >>> ASK { ?x ?p ?v
> >>>       FILTER (?v != "IV"^^roman:numeral) }
> >>>Result1: no
> >>>
> >>>Of course, and extended SPARQL implementation may give you a yes for
> >>>the latter but the issue that will make users cock their heads shows
> >>>up in the unextended implementation.
> >>That's inevitable with monotonicity + extensible datatypes + ASK masking 
> >>error vs false.  And that's OK.
> >>
> >>I still think explicitly talking about value spaces (a paragraph) will 
> >>make it clearer.  Then say "=" etc works on same-value space pairs.
> >>
> >>If you want, I'll write this text.
> >
> >That would probably help. In every concept I've had for it, the extra
> >level of indirection wasn't helpful.
> 

-- 
-eric

home-office: +1.617.395.1213 (usually 900-2300 CET)
	    +33.1.45.35.62.14
cell:       +33.6.73.84.87.26

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.

Received on Tuesday, 22 August 2006 13:01:16 UTC