Value testing: implementing str()

I was working through implementing and testing str(): it's defined as:

"""
Returns an xs:string representation of an r:URI. This useful for examining 
parts of a URI, for instance, the hostname.
"""

and the grammar production is:

'str' '(' VarOrLiteralAsExpr ')'

(see below -- this may be insufficient)

At the Finland F2F we said:
[[[
RESOLVED: to use str(fn) to map URIs (and other things) to string and =~ 
does not implicitly cast URIs to strings (nor do other operators). KendallC 
  abstaining
]]]

We have 5 kinds of things to consider:

1/ URIs
2/ bNodes
3/ Plain literals
4/ Typed literals
5/ Expressions

unbound variables are covered by the rule that it's a value expression and 
hence an evaluation error.

1/ URIs

str(<http://example.org/wombat>) => "http://example.org/wombat"
   That's a plain string right?

str(<wombat>) => "http://example.org/wombat"

relative URIs are resolved during parsing so suppose the base is 
http://example.org/sheep.html we get the above.


2/ bNodes

Error in evaluation, but "" would also seem to be possible.
Whatever is convenient in dealing with a result set of solutions where 
sometimes its a URI and sometimes a bNode.


3/ Plain literals

str("wombat") => "wombat" -- a plain literal

Or xsd:string?  Which?  As we can't cast to plain literal (??), it had 
betten be a plain literal.


4/ Typed literals

str("foo"^^ns:myType) => "foo"

I suggest this evaluates to the lexical form of the typed literal.  Then 
datatype(), str() and lang() on typed literals are accessors into typed 
literals.

str(1) => str("1"^^xsd:integer) => "1"

str("001"^^xsd:integer) => "001"

Note the non-canonical lexical form is preserved to be consistent with the 
handling with datatypes in general.

Because of this, I have removed hex literals from the grammar.  They coudl 
eb handled as second class citizens,

0xFF => "255"^^xsd:integer

not preserving the lexical form.  Is this right by XSD?

str("abc"^^xsd:integer) => "abc"

Nothing about the validity of XSD datatypes at this level.


5/ Expressions

These are banned by the grammar production but at least one is reasonable:

str(datatype(1)) => "http://www.w3.org/2001/XMLSchema#integer"

As datatype is an accessor returning a URI (type rdf:uri in table 11.1 - 
Eric, elsewhere it's an r:URI) it seen reasonable to me that this expression 
is legal.

if the production is

'str' '(' Expression ')'

other things become legal syntax:

str(?x + 1)

That is well defined, if an unlikely case, because "+" evaluates to an 
xsd:integer/xsd:double so has a lexical form.  From my implementation 
experience it is less work to have the general "expression" than special 
casing it - I'd need to check that programmatic created queries are valid if 
printed.

A corollary of this is that extension functions should allowed to return 
values, and not just booleans, making functions and casting syntactcally 
identical.



Proposal:

1/ str(type literal) return lexical form for the typed literals, without 
canonicalization
2/ str(bNode) is an evaluation error
3/ str(expression) is legal - change the grammar production

The only other choice I see is that str() only apply (syntactically) to a 
variable but is we have value-based constraints, this is a rather odd.

 Andy

Received on Sunday, 20 February 2005 14:00:39 UTC