Re: Exact format for XML Literals? from Bijan Parsia on 2009-09-14 (public-rdf-dawg@w3.org from July to September 2009)

From: Bijan Parsia <bparsia@cs.man.ac.uk>
Date: Mon, 14 Sep 2009 12:25:27 +0100
To: Ivan Herman <ivan@w3.org>
Cc: "Seaborne, Andy" <andy.seaborne@hp.com>, Axel Polleres <axel.polleres@deri.org>, W3C SPARQL Working Group <public-rdf-dawg@w3.org>
Message-Id: <66AF563A-341C-40F8-BFA7-D7A94926347F@cs.man.ac.uk>
On 14 Sep 2009, at 11:06, Ivan Herman wrote:

> Hi Bijan,
>
> Bijan Parsia wrote:
>> On 14 Sep 2009, at 05:58, Ivan Herman wrote:
[snip]
> (I just realized that I wanted to use rdf:XMLLiteral in the example.
> Sorry about that...)

No worries.

>>> My feeling is that the answer should be 'true', regardless of the  
>>> fact
>>> that the two literals are different in the order of the  
>>> attributes and
>>> the usage of white spaces.
>>
>> Since comparisons are normally in "term" space, i.e., lexical  
>> space, my
>> feeling is different.
>>
>
> Hm. We really do have different feelings:-).
>
> So if I have the data as
>
> <> ex:a "1.00"^^xsd:float .
>
> then
>
> ASK WHERE { ?a ex:b "1.0"^^xsd:float . }
>
> should return false?

One needs to pick whether one is going to be picky or coercing about  
such matters. XPath, for example, aggressively atomizes, so you can  
be sloppy in lots of places and it'll compare things at "the right"  
level. Perl too (I think).  This is good for some things and bites  
you for others.

I believe graph matching is defined on the literal form, which, if I  
understand it correctly, at least *involves* comparing on the lexical  
form. So yes. Some operators act on the value space, so would behave  
differently. Furthermore, in an entailment regime, the value space  
plays a role.

> Is it then in the realm of the entailement regimes
> in the sense that it would require D-entailement to be able to say
> 'true'?

Sure. The answer is TRUE under an OWL2 entailment regime, for example.

Consider the following query:

	ASK WHERE (?a ex:b ?lv FILTER	?lv = "1.0"^^xsd:float.}

I believe the answer is TRUE there as well. (But false in the prior  
one....I hope I'm right...not achieving controlling spec text joy at  
the moment...still weekended).

It partially depends, as well, on whether your triple store  
canonicalizes the input.

But also consider the following graph:

	:a :p "abc"^^rdf:XMLLiteral.
	:a :p :b.
	:a :q :c.
	:b rdf:type xsd:positiveInteger.
	:c rdf:type xsd:negativeInteger.

And the following queries:

	ASK WHERE {:a :p ?x. FILTER isLiteral(?x)} <--returns TRUE in SPARQL  
1.0; under RDF entailment...it's not a literal. I.e.,
	ASK WHERE {:a :p ?x. ?x rdf:type rdfs:Literal} <-- returns FALSE,  
but I expect that the first query still returns TRUE.

	ASK WHERE {:a :p ?x. :a :q ?y. FILTER ?x > ?y} <-- Not sure what it  
returns...perhaps an error? Under an appropriate D-entailment (or OWL  
Full) it presumably could be true, as :b is greater than :c.


> That may well be the answer (and we may want to think about this
> when discussing entailement regimes)...

[snip]

> (you seem to have referred to the WD).

Yeah. Sucks to be me.

> I am not arguing at this point whether this is right or wrong (see
> below).

I am :)

> And indeed you are right that no other datatype requires some
> sort of a canonicalization.
>
>> But even if you buy that coming from RDF/XML you'll end up with
>> canonicalized lexical forms, not every source must do that. AFAICT,
>> SPARQL is silent on canonicalization...XMLLiteral is just another
>> datatyped literal. So those would definitely not match.
>>
>
> O.k., I agree with your analysis that SPARQL is silent on that.  
> Then my
> question is, in fact: is this o.k.? Shouldn't SPARQL do the same as
> RDF/XML that explicitly refers to canonicalization?

No. RDF/XML should change.

> If not, the only way I could get a 'yes' answer to my original  
> question
> would be to canonicalize the whole thing myself, ie:

A function could do the job as well.

>>> Does the SPARQL spec says the same?
>>>
>>> Note that this is _not_ the case as if we replaced the two literals
>>> with, say, 1.0 and 1.00 declaring both to be floats. The way XML
>>> Literal is currently defined is such that the lexical form (not the
>>> value space!) is the canonical XML version.
>>
>> This is false. See above.
>
> I am not sure what 'This' refers to here...

My bad.

>> If it were true, then semantically the first graph would have a
>> not-well-formed literal, thus, semantically, would not be an  
>> instance of
>> rdfs:Literal.
>>
>
> I am not sure I understand that.:-(

Ok, in RDF/XML it all works out due to the horrible hack in the parse  
phase. My guess is that other serializations get this wrong.  
NTriples, for example.

Consider the following XMLLiteral serialization:

1) "abc"^^rdf:XMLLiteral

"abc" is not well formed, ergo 1 does *not* denote an XMLLiteral  
(according to RDF interpretations). Now consider:

2) "<a b='foo'     c='bar' />"^^rdf:XMLLiteral

If this appears inside a parseType=Literal, it will be coerced by the  
parser into an XMLLiteral. However, if you cut and paste it into an  
NTriples (or Turtle?) file, it will *not* be coerced (since the  
parser doesn't touch what's between the "s) and thus will *not*  
denote an XMLLiteral (or, indeed, any literal at all). Actually, it  
may be the case that if you make rdfs:Resource rdfs:subClassOf  
rdfs:Literal. and something including 2 that you'll get a contradiction.

Actually, this is another way that RDF/XML cannot enocode RDF graphs.

3)	:a :p "<a b='foo'     c='bar' />"^^rdf:XMLLiteral.

is a legal RDF graph that is not RDF equivalent to:

4)	<rdf:Description rdf:about="a">
		<p rdf:parseType="Literal">
			<a b='foo'     c='bar' />
	...

and, in fact, I don't think can be represented by any RDF/XML document.

We need a clear story, in the end. The story can be a bit  
complicated, but it should be clear. Ideally, a list of rules ;)

[snip]
> Reporting a bug to the RDF document is perfectly possible.

Or we could create a rec that supercedes the definition of  
rdf:XMLLiteral.

> But this
> should be done by trying to understand how this part of the REC was
> created, ie, contacting the original editors,

Meh. If the utility isn't apparent to practitioners today, then I'd  
say that whatever reasons there were then are moot.

> and probably refer to the
> community in some way or other.

This is key. I'd like to know what extant triplestores do and if they  
*do* canonicalize, whether they'd be willing to loosen it.

The smallest possible change would be to relax the lexical space, but  
leave RDF/XML parsing untouched. That wouldn't even change the number  
of graphs that RDF/XML can't serialize (correctly :)).

> It is, however, on the borderline
> whether this is a bug or a change in the Rec; the latter may become  
> more
> touchy indeed.

XML 5th edition opens a world of possibilities ;)

> Personally, I do not really understand the reasons of this definition
> either. I am not a very good experts in XML, but I would have expected
> the lexical space of XMLLiteral to be well formed XML, and the value
> space to be the canonical XML version, or maybe even the Infoset as an
> abstract representation of the XML content. But, again, I am not an
> expert on all the details of XML:-(

The real killer is getting rid of "extra" namespaces, which means you  
can't round trip e.g., XSLT through RDF/XML successfully.

Cheers,
Bijan.
Received on Monday, 14 September 2009 11:20:53 UTC