Re: test for DATATYPE on plain literals, please

Eric Prud'hommeaux wrote:
> On Mon, Nov 14, 2005 at 02:08:34PM +0000, Seaborne, Andy wrote:
> 
>>
>>
>>Dan Connolly wrote:
>>
>>>"It returns <xsd:string> if the argument is an untyped literal."
>>>-- http://www.w3.org/2001/sw/DataAccess/rq23/#func-datatype
>>>
>>>
>>>SPARQLer doesn't seem to agree; I tried this query:
>>>
>>>SELECT ?book ?title WHERE { ?book dc:title ?title FILTER
>>>( DATATYPE(?title)) }
>>>
>>>http://sparql.org/books?query=PREFIX+books%3A+++%3Chttp%3A%2F%
>>>2Fexample.org%2Fbook%2F%3E%0D%0APREFIX+dc%3A++++++%3Chttp%3A%2F%
>>>2Fpurl.org%2Fdc%2Felements%2F1.1%2F%3E%0D%0ASELECT+%3Fbook+%3Ftitle
>>>+WHERE+%7B+%3Fbook+dc%3Atitle+%3Ftitle+FILTER+%28+DATATYPE%28%3Ftitle%
>>>29%29+%7D%0D%0A&stylesheet=xml-to-html.xsl
>>>
>>>and got no results.
>>
>>ARQ 1.0 is broken in this area - I've rewritten the whole area.  But your 
>>query still returns nothing :-(
>>
>>
>>>I suspect SPARQLer is not quite caught up there. To make sure
>>>developers learn about this detail, let's make sure there's a test,
>>>please.
>>>
>>>Steve, would you please add a test or get somebody to do it?
>>>Or if one is already there, point me to it?
>>>
>>>I looked, and the closest I see is something with doubles.
>>>http://www.w3.org/2001/sw/DataAccess/tests/#datatype-1
>>>
>>
>>There are several things going on here as well:
>>
>>1/ Whether DATATYPE(?title) should xsd:string
>>
>>2/ The EBV rules
>>
>>3/ Whether EBV applies at all.
>>
>>- - - - - -
>>
>>1/ Whether DATATYPE(?title) => xsd:string
>>
>>It's a D-entailment:
>>http://www.w3.org/TR/rdf-mt/#DtypeRules
>>
>>xsd 1a  uuu aaa "sss".    => uuu aaa "sss"^^xsd:string .
>>xsd 1b  uuu aaa "sss"^^xsd:string .  => uuu aaa "sss".
>>
>>that equates plain literals and xsd:strings.  It seems reasonable to treat 
>>this like any other XSD vocabulary interpretation but I see datatype() as 
>>acting on the syntax to extract the datatytpe of the literal, not acting on 
> 
> 
> This seems like a perfectly reasonable principle, but it was tricky to
> say what could be returned that was either an IRI or an error if the
> argument had not datatype. The consistency checking use case came up,
> but I decided to relegate that to extension functions. Downside is
> that you lose interop on, and maybe ubiquity of deployment of,
> my:fussyDatatype().

Why is that difficult to say?

datatype(<iri>) => type error
datatype([]) => type error

datatype("abc") => error
datatype("abc"^^:someType) => @someType

It's a ^^ accessor - thsi one seems like the easiest of the three accessors.

str(literal) is an lexical form accessor.
str(<uri>) is an overloaded form of str/1.  It's really a different function.


lang("") => error also works for me though "" is natural becausle of 
xml:lang="" - at least it's false.

FILTER (isLiteral(?v) && lang(?v))

"" only works here because "" isn't a legal langtag.
This does not apply to datatype because there is no illegal IRI.


> 
> The current wording says that it's syntactic plus the
> ""->""^^xs:string :
> [[
> Returns the datatype of arg if arg is a typed literal. It returns
> <xsd:string> if arg is an untyped literal.
> ]]
> 
> Nothing in there says that "foo"^^subtype implies "foo"^^supertype.
> 11.1.1 Type Promotion [PROM] describes the type promotion mechanism
> for finding an appropriate numeric functions but keeps the datatype
> pristine:
> [[
> Promotion does not change the bindings of variables.
> ]]
> 
> Do we need more text specifically steering folks away from potential
> misconception?
> 
> 
>>entailments in the graph.  Concretely, in a graph with both untyped 
>>literals and literals with xsd:strings, how can the application sort them 
>>out for, say, consistency checking?  Raising an error on an a plain literal 
>>seems more consistent than returning xsd:string
>>
>>- - - - - -
>>
>>2/ EBV rules
>>
>>As Steve pointed out, the boolean effective value rules are "default true".
>>
>>I have implemented Boolean Effective Value to test for the enumerated cases 
>>in rq23 but on unknown types and things that aren't literals at all it 
>>raises an error.  As written, the rules say somesort of coercion is always 
>>possible - what if there is an error raised?
>>
>>ARQ (in CVS) returns xsd:string on plain literals so still rejects all 
>>solutions. Maybe it should return false (xsd:string is known not to be 
>>xsd:boolean value true) - ARQ behind SPARQL is the 1.0 release and is buggy 
>>in this area.
>>
>>I would prefer the BEV rules to say something like:
>>
>>-----------------------------
>>
>>11.2.2 Effective Boolean Value
>>
>>When an operand is coerced to xsd:boolean through invoking a function that 
>>takes a xsd:boolean argument, the following rules apply:
>>
>>If the operand is not a literal, a type error is generated.
>>
>>If the literal is known to be the value true the result is true.
>>
>>If the literal is of unknown result is false.
>>
>>If the literal is an XSD datatype then the result is TRUE unless any of the 
>>following are true:
>>
>>    * The operand is unbound.
>>    * The operand is an xsd:boolean with a FALSE value.
>>    * The operand is a 0-length untyped RDF literal or xsd:string.
>>    * The operand is any numeric type with a value of 0.
>>    * The operand is an xsd:double or xsd:float with a value of NaN
>>-----------------------------
> 
> 
> The EBV rules come from XPath's EBV [XEBV] which leans on the
> fn:boolean constructor in F&O [BOOL]. The SPARQL ones have the
> sequence rules removed, and resulting in two TRUE cases added:
> IRI and bNode.

At a minimum, I think that there should be explicit rules for these two (and 
they should be false), also also for literals with a datatype.

The list does not talk about typed literals except some XSD ones.  Under these 
rules, "false"^^:myBoolean is true.

See also:
_:a owl:sameAs true .

> Are you still motivated to change this, given that it will fall
> away from the XQuery semantics?

[I'm trying to dig into the underlying design.]

In terms of principles, I'd like to follow XQuery/XPath semantics whereever we 
can.  Where theer are things that don't work out in someway, especially where 
it is a clash with a principle we have elsewhere, we have to resolve each clash.

One such clash is that we exclude solutions when a SPARQL processor does not 
positively know it to be true: XPath/XQuery appear to use the principle that 
things should be returned unless excluded.  Both are good principles for their 
use cases.  XPath/XQuery has been informated by XML processing needs; SPARQL 
is driven by the open world assumption.

http://www.w3.org/TR/xpath20/#id-ebv:
[[
2.4.3 Effective Boolean Value

Under certain circumstances (listed below), it is necessary to find the 
effective boolean value of a value. [Definition: The effective boolean value 
of a value is defined as the result of applying the fn:boolean function to the 
value, as defined in [XQuery 1.0 and XPath 2.0 Functions and Operators].]

The effective boolean value of a sequence is computed implicitly during 
processing of the following types of expressions:

     * Logical expressions (and, or)
     * The fn:not function
     * Certain types of predicates, such as a[b]
     * Conditional expressions (if)
     * Quantified expressions (some, every)
     * General comparisons, in XPath 1.0 compatibility mode.

Note:

The definition of effective boolean value is not used when casting a value to 
the type xs:boolean, for example in a cast expression or when passing a value 
to a function whose expected parameter is of type xs:boolean.
]]

which I note says "Under certain circumstances"

     * Logical expressions (and, or)
Yes - EBV is OK although "false"^^:myBoolean is broken.

?x and ?y plain literals
FILTER (?x || ?y) accepts a solution if either is not the empty string. 
Convenient I guess (I'd be happy to force people to write

FILTER (?x != "" || ?y != "")
cos its now the same as
FILTER (! (?x = "" && ?y = "") )

     * The fn:not function
Ditto

     * Certain types of predicates, such as a[b]
This is as near to FILTER as it comes.

     * Conditional expressions (if)
N/A in SPARQL v1. I can see it arising in v2.

     * Quantified expressions (some, every)
N/A in SPARQL v1. I can see it arising in v2.

     * General comparisons, in XPath 1.0 compatibility mode.
N/A


We could define FILTER(expression) as having an implict cast to xsd:boolean. 
FILTER(xsd:boolean(expression)).


By the way - these rules do not apply when casting to boolean and also not 
when passing a value to a function whose expected parameter is of type xs:boolean.
(http://www.w3.org/TR/xpath20/#id-ebv end 2.4.3)

That makes the text in rq23:
[[
When an operand is coerced to xsd:boolean through invoking a function that
takes a xsd:boolean argument,
]]

look to be a contradiction of that.  Similarly the rq23 text:

[[
Functions invoked with an argument of the wrong type (except xsd:boolean) will 
produce a type error. Functions requiring an argument of type xsd:boolean are 
coerced to xsd:boolean using the EBV rules in section 11.2.2 .
]]

It's only fn:not, || and && that take EBVs.


Observation: if EBV had been defined the other way round: i.e. as false except 
when:
     * The operand is an xsd:boolean with a TRUE value.
     * The operand is literal with a non-zero-length lexical form
     * The operand is any decimal type with a value of != 0
     * The operand is an xsd:double or xsd:float with a value != NaN

we'd be much closer.  Still have problems with unknown datatypes that may 
overlap with the boolean value space.

> 
> 
>>- - - - - -
>>
>>3/ Whether EBV applies at all?
>>
>>rq23 says "When an operand is coerced to xsd:boolean through invoking a 
>>function"  FILTER is not a function.  The definition of Value Constraint 
>>applies only on boolena valued expressions so somewhere between the two, we 
>>need to fix the text.
> 
> 
> Filter operatos on an EBV. Any proposed changes to:?
> [[
> SPARQL FILTERs restrict the set of solutions according to the given
> expression. Specifically, FILTERs eliminate any solutions that, when
> substituted into the expression, result in either an effective boolean
> value of false or produce a type error. Effective boolean values are
> defined in section 11.2.2 Effective Boolean Value, type error is
> defined in XQuery 1.0: An XML Query Language [XQUERY] section 2.3.1,
> Kinds of Errors.
> ]]

Section 3 and the definition of value constraints needs to align with this.
It says the an constraint is a boolean value expression.

 Andy

> 
> 
> [PROM] http://www.w3.org/2001/sw/DataAccess/rq23/#promotion
> [XEBV] http://www.w3.org/TR/xpath20/#id-ebv
> [BOOL] http://www.w3.org/TR/xquery-operators/#func-boolean
> [TEST] http://www.w3.org/2001/sw/DataAccess/rq23/#tests

Received on Monday, 14 November 2005 18:02:03 UTC