An analysis of whether we should include rdf:XMLLiteral into OWL 2 (ACTION-244) from Boris Motik on 2008-11-14 (public-owl-wg@w3.org from November 2008)

From: Boris Motik <boris.motik@comlab.ox.ac.uk>
Date: Fri, 14 Nov 2008 23:38:01 -0000
To: "'W3C OWL Working Group'" <public-owl-wg@w3.org>
Message-ID: <E114E6D9D3B04EC5B57E55C443186734@wolf>

Hello,

At the last teleconf I was tasked to investigate whether we should include the rdf:XMLLiteral datatype into OWL 2. Here are the
results of my findings.

There are no principal technical problems with including rdf:XMLLiteral into OWL 2. If we choose to do so, we should make the value
space of rdf:XMLLiteral disjoint with the value spaces of all other datatypes (and of various string variants as well). Furthermore,
we should not provide any facets on the datatype. Under such a definition, the datatype always has an infinite value space, so it
does not cause problems for reasoning.

I am not convinced, however, that this datatype is all that useful. In fact, the datatype's definition seems to contain a feature
that may pose a significant hurdle to the practical usage of the datatype. The definition of the lexical space from

http://www.w3.org/TR/rdf-concepts/#dfn-rdf-XMLLiteral

says the following:

The lexical space
is the set of all strings:
which are well-balanced, self-contained XML content [XML];
for which encoding as UTF-8 [RFC 2279] yields exclusive Canonical XML (with comments, with
empty InclusiveNamespaces PrefixList ) [XML-XC14N];
for which embedding between an arbitrary XML start tag and an end tag yields a document
conforming to XML Namespaces [XML-NS]

It defines the value space of the datatype as being in a one-to-one relationship with the lexical space.

Now I believe that the second condition actually poses significant hurdles to practical usage of the datatype, as it requires XML
lexical values to be canonicalized. This means that, for example, the following literal is syntactically incorrect:

"<a/>"^^rdf:XMLLiteral

The canonical form of XML embedded in this literal is <a></a>, so this is what you are supposed to write if you want to produce
syntactically valid lexical values of rdf:XMLLiteral.

The canonicalization process is quite complex, and most quite "reasonable" XML documents are not in canonical form. This means that
you cannot use rdf:XMLLiteral to represent most reasonable XML fragments.

Given this situation, I'm really wondering whether really need this datatype in OWL 2. It would introduce an implementation hurdle
(implementations would need to check whether all literals are correctly typed, and to do this they must implement the complex
canonicalization process) without an obvious benefit. Furthermore, I wonder if there is an OWL 1 implementation that correctly
implements this datatype (I would strongly suspect that there is none). Finally, since the datatype map of OWL 2 is open to
extensions, implementations are free to implement this datatype if they really need it.

The latter is just my opinion; undoubtedly you'll let me know what yours is :-)

Regards,

Boris

Received on Friday, 14 November 2008 23:38:44 UTC