- From: Seaborne, Andy <andy.seaborne@hp.com>
- Date: Wed, 09 Aug 2006 10:04:11 +0100
- To: Eric Prud'hommeaux <eric@w3.org>
- CC: Pat Hayes <phayes@ihmc.us>, RDF Data Access Working Group <public-rdf-dawg@w3.org>
Eric Prud'hommeaux wrote: > On Mon, Aug 07, 2006 at 12:55:08PM +0100, Seaborne, Andy wrote: >> >> >> Eric Prud'hommeaux wrote: >>> On Fri, Aug 04, 2006 at 10:48:34AM +0100, Seaborne, Andy wrote: >>>> How about a scheme like this for comparison of literals: >>>> >>>> 1/ Be explicit about value spaces; the design is comparison by-value. >>>> >>>> All operators return true if the implementation positively knows that the >>>> two values compare as needed, return false if the implementation >>>> positively knows that the two value do not compare as needed and returns >>>> error if it does not know. >>>> >>>> http://www.w3.org/TR/xmlschema-2/#value-space >>>> >>>> 2/ Define sop:value-compare(A, B) to be -1, 0 , 1 or error depending on >>>> whether A and B are less than, equal, greater than, or it's an unknown >>>> comparison. >>>> >>>> Note that sop:value-compare can be partial. A processors always knows A >>>> = B without much else if the lexical forms and datatypes match. >>>> >>>> 3/ Define =, !=, <, <= , > , >= to be the relevant result(s) of >>>> value-compare >>>> >>>> 4/ State which datatypes that are required for a SPARQL engine (this >>>> could even be less than the current set; xsd:int but not arbitrary length >>>> integers; no decimals, or no dateTime which are a bit larger in >>>> implementation costs). >>>> >>>> 5/ Show that value-compare maps to the "XPath Tests" table for the >>>> operators where an implementation provides them. >>> I found it more intuitive to use the XPath tests directly. Proposal below. >> I'm not sure what your position is with respect to points 1-4. Is the >> proposal below about how to express the use of F&O tests for >> sop:value-compare for the known datatypes or is that text the only mapping >> from grammar items to F&O? > > Let's see, what is our critical requirement: > monotonic WRT (returned) solutions > > => no query string has a naive interpretation that yields a non-error > value and a clever interpretation that yields a different value. > > => no operator can be mapped to two operator functions, depending on > the implementation, where the naive implementation yields a > non-error and the clever implementation yields a different value. > > eg. "'II'^^roman:numeral = 2" must yield an error in the naive > implementation (or it could yeild false (unequal strings), but > *every* implementation would have to yield a false). > > I don't see how sop:value-compare helps us. We really need the > distinction in the query string, so that the same query can not be > interpreted two ways. value-compare('II'^^roman:numeral, 2) is error or EQ, meeting the first point above (using value-compare(A,B) to return one of error, LT, GT, NE, EQ) Every operator maps to value-compare(A,B) = and != are always value-testing for literals. They are We should retain the current table for mapping for the XSD datatypes mentioned because that covers the most common cases for users and for implementers. > We could still handle the syntactic distinction separately and use > this intermediate value-compare, but it seems easier for us and for > consumers of the spec to map directly to F&O tests. For instance, each > F&O test has a defined boolean result which means we don't have to > invent a sop:value-compare return value that signifies != without > indicating less than or greater than. > >> If it's the latter, the proposal is as before isn't it? - introduce >> "sameLiteral" and reserve the syntax of infix operators for a fixed set of >> datatypes? Including "="? > > Exactly, though now I'm thinking that sameNode is more appropriate > (explained below). sameNode is a better name (it's definition is the same as RDFTermEquals). > >> More below. I think we covered the rest in the telecon. If not, prompt me. From the telecon: The table we had was: | IRI | bNode | Literal ------------------------------- IRI | T or F | F | F bNode | F | T or F | F Literal | F | F | E, T or F literal/literal is a value comparison. There is a function sameNode(A,B). It is used to implement "=" and "!=" for IRI/IRI and bNode/bNode in the table above. It is also callable as a function on literals where it is a syntactic test. Fred suggested an isError(expression) operator that return T is the expression evaluates to an error, and is false otherwise. This is to facilitate data validating applications. Andy >> >>>> 6/ = and != can be defined on non-literals be RDFterm-equals as currently. >>>> >>>> In terms of text change and test change and implementation impact, this >>>> is actually quite a small change because it exactly agrees on the fixed >>>> set of datatypes we already have. It just permits extensibility through >>>> the principle is value testing. >>>> >>>> An implementation can provide more datatypes as it chooses, meeting the >>>> "Extensible Value Testing". It is explicitly monotonic in the >>>> capabilities of the processor. But now legacy or other standards for >>>> datatypes can be added smoothly (e.g. ISO 8601 date and time which is not >>>> exactly the same as XSD dateTime). >>>> >>>> Andy >>>> >>>> >>>> >>>> Pat Hayes wrote: >>>>>> On Tue, Aug 01, 2006 at 11:19:45AM -0700, Pat Hayes wrote: >>>>>>> Re. my action item from today's telecon. >>>>>>> >>>>>>> After looking at Andy's examples in >>>>>>> http://lists.w3.org/Archives/Public/public-rdf-dawg/2006AprJun/0104.html >>>>>>> more closely, his example 6 seems to behave correctly for the issue >>>>>>> that you were raising, if I understand it properly. In which case no >>>>>>> further examples are needed, and my action item is moot. >>>>>>> >>>>>>> So let me see if I have got this right. >>>>>>> >>>>>>> My understanding of your concern was that we had a nonmonotonic >>>>>>> situation because a not-equal ( !=) filter, as in example 6, behaved >>>>>>> as follows: when faced with an unknown datatype, it would revert to a >>>>>>> string-not-equal test on the literal string, and so succeed when the >>>>>>> literal strings were distinct but the type URI matches; and then this >>>>>>> success might transform to a failure when better datatyping >>>>>>> information is available. >>>>>> Our measure of monotinicity is that adding knowledge to the system >>>>>> does not cause us to rescind conclusions. We should never get answers >>>>> >from the naive implementation that we don't get from the omniscient >>>>>> one (adding support for a datatype should not cause us to rescind >>>>>> answers). >>>>> Agreed. >>>>> >>>>>> The current text in rq2{3,4} has: >>>>>> >>>>>> [[ >>>>>> When selecting the operator definition for a given set of parameters, >>>>>> the definition with the most specific parameters applies. For >>>>>> instance, when evaluating xsd:integer = xsd:signedInt, the definition >>>>>> for = with two numeric parameters applies, rather than the one with >>>>>> two RDF terms. The table is arranged so that upper-most viable >>>>>> candiate is the most specific. >>>>>> ... >>>>>> A != B numeric numeric fn:not(op:numeric-equal(A, B)) >>>>>> A != B xsd:boolean xsd:boolean fn:not(op:boolean-equal(A, B)) >>>>>> A != B xsd:dateTime xsd:dateTime fn:not(op:dateTime-equal(A, B)) >>>>>> ... >>>>>> A != B RDF term RDF term fn:not(RDFterm-equal(A, B)) >>> PROPOSED: change the paragraph to: >>> >>> [[ >>> 11.3 Operator Mapping >>> >>> The SPARQL grammar identifies a set of operators (for instance, &&, *, >>> isIRI) used to construct constraints. The following table associates >>> each of these grammatical productions with the appropriate opperands >>> and an operator function defined either by XPath or the SPARQL >>> operators specified in section 11.4. Operators invoked without >>> appropriate operators result in a type error. >> This paragraph precludes adding new datatypes. There are quite a few date >> representations around so I think it is desirable to have a scheme that >> allows existing data to be mapped to RDF and used with SPARQL without >> needing to a transformation of datatypes. > > Agreed. This takes the conservative approach of defining a single > SPARQL language, rather than a set of languages that have a minimal > subset and have a prescribed behavoir for certain extensions. The > question is whether the implementation that understands roman:numerals > is still SPARQL, or just some benevolent extension. > > At present, "implementation" occurs once in the normative part of the > document: > [[ > RDF Concepts and Abstract Syntax "anticipates an RFC on > Internationalized Resource Identifiers. Implementations may issue > warnings concerning the use of RDF URI References that do not conform > with [IRI draft] or its successors." > ]] > > If we want to make SPARQL specifically embrace (or tollerate) certain > extensions, we have lots of work to do outside of this text. Unless it > is patently clear that this must affect our strategy, I propose that > we handle it separately (i.e. wordsmith everything at the same time). > > For example, 11.1 Operand Data Types currently enumerates the > datatypes in the SPARQL language. We'll have to scratch our heads and > decide if we want to continue with DanC's don't-say-implementation > approach, or start to define compatibility levels. > > >>> SPARQL follows XPath's scheme for numeric type promotions and subtype >>> substitution for arguments to numeric operators. The XPath Operator >>> Mapping rules for operands of type {xs:integer, xs:decimal, xs:float, >>> xs:double} or any derivative types apply to SPARQL operators as well. >>> For instance, when evaluating xsd:integer = xsd:signedInt, the >>> definition for = with two numeric parameters applies. Some of the >>> operators are associated with nested function expressions, >>> e.g. fn:not(op:numeric-equal(A, B)). Note that per the XPath >>> definitions, fn:not and op:numeric-equal produce an error if their >>> argument is an error. >>> ]] >>> >>> This is a little more formal than the current text, but does not go so >>> far as to quote the relevent 3 paragraphs and table from >>> http://www.w3.org/TR/xpath20/#mapping . >>> >>> We also need to change the operators for RDFterm-equal(A, B): >>> ('-' and '+' indicate rows removed from, and added to the table.) >>> [[ >>> - A = B RDF term RDF term RDFterm-equal(A, B) >>> - A != B RDF term RDF term fn:not(RDFterm-equal(A, B)) >>> + A = B IRI IRI RDFterm-equal(A, B) >>> + A = B blank node blank node RDFterm-equal(A, B) >>> + A != B IRI IRI RDFterm-equal(A, B) >>> + A != B blank node blank node RDFterm-equal(A, B) >> fn:not(RDFterm-equal(A, B)) ? > > yeah, what you said. > >>> + sameLITERAL(A, B) literal literal RDFterm-equal(A, B) >>> ]] >>> >>> and in grammar rule 57, we add: >>> [[ >>> | 'sameLITERAL' '(' Expression ',' Expression ')' >>> ]] >>> >>> I think this is what Andy meant by "Be explicit about value spaces". >> Not really. I don't see any discussion of value spaces. If those are the >> only changes, the text still precludes adding understanding of a new >> datatype (examples: xsd:date of ISO 8601 date and time) and having the >> syntax for "<" work. A design that does work for "<" can also work for "=" >> and "!=" >> >> Examples: >> >> SELECT ?x { ?x :p ?v } ORDER BY ?v >> or >> SELECT ?x { ?x :p ?v . ?x :q ?w . FILTER ( ?v < ?w ) } >> >> for ?v, ?w being xsd:dates or ISO 8601 dates (including mixed). >> >> The fact that >> >> SELECT ?x { ?x :p ?v . ?x :q ?w . FILTER ( ?v = ?w ) } >> >> work only for a fixed set of types is rather confusing. > > "Confusing"? or "limiting"... > > >> The point about being explicit about value spaces is make the set of >> datatypes a processor can provide be an open set, not a closed set, if it >> follows the principle that a processor tests based on compatible value >> spaces and only returns true/false if it definitely knows that relationship >> to be true/false. That open set can be smaller or larger than the current >> list. >> >> "<" maps to sop:value-compare(A,B) == -1 >> and >> sop:value-compare(numeric,numeric) == -1 >> maps to op:numeric-less-than(A, B) (well - fn:numeric-compare would be >> nice!) > > Aha! If I understand, value-compare is not intended to solve the > syntactic ambiguity of the current '=' and '!=' operators, but instead > to provide an alternate operator function that is not constrained to > existing XML Schema datatypes as the F&O operator functions are. > > Am now quite convinced this is separate from the monotonicity issue. > >>> I >>> have not doen a survey of what tests would need to change; any that do >>> a positive literal comparison based on an invocation of RDFterm-equal >>> (as currently implied by '=' or '!=' on literals that are not both of >>> known data types). >>> >>> >>> In general, I think this text follows XPath nicely, and provides more >>> clarity for implementors. Users have to get over having to explicitly >>> state when they want tests to not depend on data type support, but I >>> think I've shown that that is necessary to meet our monotonicity >>> constraints (though I didn't include "Q.E.D." as PatH had advised). >> Sorry - I don't understand that. Why is it necessary to have sameLiteral >> to meet the monotonicity goal, rather than being one way to meet the >> monotonicity goal? > > I meant that I had proven that users have to "explicitly state when > they want tests to not depend on data type support." This is handled > by the operator mapping and the change to the grammar. There are other > ways to change the operator mapping and grammar to meet this goal. > >> Specifically, in what way does the use of sop:value-compare, which is >> defined to be an error when the comparison can not be done on some pair of >> values, violate monotonicity? What's the counter-example? >> >> For the Roman numeral example below, >> >> sop:value-compare("II", 2) is either true or error. >> >> That means that applying fn:not will not come up with a non-monotonic >> situation because fn:not(error) is still an error and an error means >> something is not included in the result set. > > But the person is not typing "sop:value-compare('II', 2)", they are > typing "'II' = 2", which, unless we make the syntactic distinction I > proposed, could mean RDFterm-equal('II', 2), which sould simply be > false in the naive case, and true in the more clever case. > >> [ >> By the way: >> >> http://www.w3.org/TR/xmlschema-2/#decimal-lexical-representation >> Decimal is limited to 0-9, ., + and - so Roman numerals can#t be a >> restriction. >> ] > > rats! it is such a handy example. For the purposes of our discusion, > let's ignore this. > >> Andy >> >>>>>> The naive implementation sees >>>>>> "2"^^xsd:integer != "II"^^roman:numeral >>>>>> and says "are they both numerics? no, boolean? no ... RDF terms? yes" >>>>>> and does the RDFterm-equal test. They are not the same term so the >>>>>> answer is TRUE (remember, *not* equal). >>>>> OK, I agree this is broken as written, but then this also seems to be >>>>> at odds with test 6 in that test suite. So I guess my point is, >>>>> regardless of what the spec currently says, those tests illustrate >>>>> what the right behavior OUGHT to be, which would be that a != between >>>>> two literals with unknown datatypes is simply unknown, and can never >>>>> succeed, regardless of the RDF term equality result between them. So, >>>>> reverting now to my very limited action item, I don't need to tweak >>>>> those tests or add to them in order to show what the result SHOULD >>>>> be. Right? >>>>> >>>>>> Some wise-guy adds support for roman:numeral to make the omniscient >>>>>> implementation from the following schema (note: restriction of >> decimal): >>>>>> <xs:simpleType name="numeral" id="numeral"> >>>>>> <xs:restriction base="xs:decimal"> >>>>>> <xs:fractionDigits fixed="true" value="0" >>>>>> id="romanNumeral.fractionDigits"/> >>>>>> <xs:pattern value="[IVDXLC]+"/> >>>>>> <xs:minInclusive value="0" id="romanNumeral.minInclusive"/> >>>>>> </xs:restriction> >>>>>> </xs:simpleType> >>>>>> >>>>>> Now the implementation says "are they both decimals? yep" and returns >>>>>> FALSE (II is *not* != 2), causing us to lose an answer that we had in >>>>>> the naive implementation. >>>>>> >>>>>> >>>>>>> But this is not what the test examples indicate. With this rule, in >>>>>>> case #6, it would give the answer binding [ x/x1, v/"b"^^t:type1 ], >>>>>>> but in fact it does not: it gives no answers, as it should in order >>>>>>> to be monotonic when more datatype information is available. And the >>>>>>> comment on text 6 seems to indicate that 'no result' is determined >>>>>>> in this case for reasons of preserving monotonicity, and works >>>>>>> symmetrically for equality and not-equality. >>>>>> I believe that this test does illustrate the problem. I can concoct a >>>>>> type system where the two are, in cleverer systems, known to be the >>>>>> same value. >>>>> Right, and in that case - following now the behavior indicated by the >>>>> example, not by the spec text you cite - the behavior will be >>>>> indistinguishable from what it is now (no answers) but if you instead >>>>> concoct a system in which they have different values, then the query >>>>> will succeed. So either way, we get monotonic behavior. Again, note I >>>>> am not following the first-line-in-table rule here, but the behavior >>>>> as specified in the test suite email: they give different results on >>>>> text 6. >>>>> >>>>> So, if we follow the rule as illustrated by test 6, which as I read >>>>> the test is that when either of A or B is typed with an unknown >>>>> datatype, then A != B test always fails while A=B succeeds only when >>>>> A and B are the exact same literal string and same datatype URI, then >>>>> we don't need to do anything about extending the equality. Right? >>>>> >>>>> Pat >>>>> >>>>>> Therefor, we need to spell it >>>>>> >>>>>> SELECT * >>>>>> { ?x :p ?v >>>>>> FILTER ( ?v !sameLiteral "a"^^t:type1 ) >>>>>> } >>>>>> >>>>>> or something like this. >>>>>> >>>>>>> So, either the tests are OK, or I have misunderstood your point. >>>>>>> >>>>>>> Eric? Or indeed, anyone with anything useful to say? >>>>>>> >>>>>>> Pat >>>>>>> -- >>>>>>> --------------------------------------------------------------------- >>>>>>> IHMC (850)434 8903 or (650)494 3973 home >>>>>>> 40 South Alcaniz St. (850)202 4416 office >>>>>>> Pensacola (850)202 4440 fax >>>>>>> FL 32502 (850)291 0667 cell >>>>>>> phayesAT-SIGNihmc.us http://www.ihmc.us/users/phayes >>>>>>> >>>>>> -- >>>>>> -eric >>>>>> >>>>>> home-office: +1.617.395.1213 (usually 900-2300 CET) >>>>>> +33.1.45.35.62.14 >>>>>> cell: +33.6.73.84.87.26 >>>>>> >>>>>> (eric@w3.org) >>>>>> Feel free to forward this message to any list for any purpose other >> than >>>>>> email address distribution. >
Received on Wednesday, 9 August 2006 09:04:34 UTC