rdf:XMLLiteral and white spaces [was: Re: [DTB] summary of editorial issues (completes ACTION-552)]

I revisited the specification of the XMLLiteral datatype [1], and it is
not as bad as I thought.
Namely, every XMLLiteral must be in exclusive XML canonical form [2],
which means that it must be in the XML canonical form [3], which in turn
means that there may be no redundant whitespace.  So, the normalization
burden is put on the authors of the documents.

So, for example, "<a />"^^rdf:XMLLiteral is not a syntactically valid
constant in RIF, because "<a />" is not in the lexical space of
rdf:XMLLiteral, since it is not in XML canonical form.

In conclusion, whitespace normalization in checking XMLLiteral equality
is a moot point.


Best, Jos

[1] http://www.w3.org/TR/rdf-concepts/#section-XMLLiteral
[2] http://www.w3.org/TR/2002/REC-xml-exc-c14n-20020718/
[3] http://www.w3.org/TR/2001/REC-xml-c14n-20010315

Jos de Bruijn wrote:
>>>>> 14) Editor's Note: Predicates for rdf:XMLLiteral such as at least
>>>>> comparison predicates (equals, not-equals) are still under discussion in
>>>>> the working group.
>>>>>
>>>>> PROPOSED: introduce equals and not-equals for XMLLiteral which matches
>>>>> modulo white-spaces in non-text content.
>>>> Two XML literals are equal if their values (as defined in [1]) are the
>>>> same and not-equal if their values are not the same. I cannot imagine
>>>> any other meaningful definition for equality of XML literals.
>>>>
>>>> [1] http://www.w3.org/TR/rdf-concepts/#section-XMLLiteral
>>> ok, that doesn't include white-space normalization or alike...
>> If you want to have whitespace normalization, you should either use a
>> different data type or introduce a function for this kind of
> 
> Actually, using a different data type might not be a bad idea.  I think
> it was a mistake of the RDF working group to have a one-to-one
> correspondence between the lexical and value space.  It would have been
> better to map XML content in the lexical space to the corresponding XML
> infoset, which is independent from the particular serialization.
> Unfortunately, I realize this just now.  I guess it's too late to change
> it in RIF.
> 
> Best, Jos
> 
> 

-- 
Jos de Bruijn            debruijn@inf.unibz.it
+390471016224         http://www.debruijn.net/
----------------------------------------------
No one who cannot rejoice in the discovery of
his own mistakes deserves to be called a
scholar.
  - Donald Foster

Received on Wednesday, 27 August 2008 15:21:33 UTC