Test cases: XML Literal value space and exclusive canonicalization

This message is prompted by some details in the recent discussion
about XML Literals between Pat Hayes and Benja Fallenstein.
I have tried to express this as much as possible as test cases.


There are two somewhat related issues:
A) Lexical space of XML Literals vs. allowed syntax in elements
    with rdf:parseType="Literal".
B) Allowed syntax with rdf:dataType="&rdf;XMLLiteral"
C) Context information for rdf:parseType="Literal"

First to A):

Two recent messages from Pat Hayes say that the lexical space
of XML Literals and the value space is in 1:1 correspondence:

http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003Aug/0026.html
 >>>>
"Note that the XML values of well-typed XML literals are in precise
1:1 correspondence with the XML literal strings of such literals, but
are not themselves character strings."
 >>>>

http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003Jul/0452.html
 >>>>
The lexical-to-value mapping is a 1:1 mapping from the lexical space
onto the value space. The value of the lexical-to-value mapping
 >>>>

This lets me ask the following test-based questions:

Do the following two RDF/XML documents entail the same graph?

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
          xmlns:eg="http://example.org/">
  <rdf:Description rdf:about="http://example.org/foo">
    <eg:bar rdf:parseType="Literal"><br/></eg:bar>
  </rdf:Description>
</rdf:RDF>

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
          xmlns:eg="http://example.org/">
  <rdf:Description rdf:about="http://example.org/foo">
    <eg:bar rdf:parseType="Literal"><br></br></eg:bar>
  </rdf:Description>
</rdf:RDF>

The reason why I ask this is that in the first one, "<br/>" is
used, which is not canonical. If the content of an element
marked with rdf:parseType="Literal" has to be the lexical
value of of the XML Literal datatype, and the lexical value
is in 1:1 correspondence with the (canonical) value space,
then the first example would be illegal. Please confirm that
the first example is legal, and that the two examples give
the same graph.
Also, please clarify, wherever necessary in the specs, that
the content of an element marked with rdf:parseType="Literal"
is not the literal value of the XML Literal, and make sure
that this is covered by an appropriate test case.

In case the first one should not be allowed, this creates
an internationalization problem, because it would be impossible
to encode an RDF/XML document with <?xml version='1.0' encoding='us-ascii'?>
and still include characters outside US-ASCII (with numeric
character references), because numeric character references
for the most part are not allowed in the canonicalization.


Now to B)

In an earlier mail
(http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003Jul/0410.html),
I asked about the case of:

<rdf:Description>
   <eg:prop rdf:parseType="Literal"><em>foo</em></eg:prop>
   <eg:prop rdf:dataType="&rdf;XMLLiteral">&lt;em>foo&lt;/em></eg:prop>
</rdf:Description>

(for which Jeremy says that this results in one single triple).
Now let's change this to:

<rdf:Description>
   <eg:prop rdf:parseType="Literal"><br/></eg:prop>
   <eg:prop rdf:dataType="&rdf;XMLLiteral">&lt;br/></eg:prop>
</rdf:Description>

Given the discussion under A), it seems to me that the most
plausible result of this is that the first line produces a
triple, but the second line is illegal, because the string
"<br/>" isn't cannonicalized. So the correct case that leads
to a single triple would be:

<rdf:Description>
   <eg:prop rdf:parseType="Literal"><br/></eg:prop>
   <eg:prop rdf:dataType="&rdf;XMLLiteral">&lt;br>&lt;/br></eg:prop>
</rdf:Description>

If this is the correct interpretation, then a test case
making <eg:prop rdf:dataType="&rdf;XMLLiteral">&lt;br/></eg:prop>
illegal (and another showing that
<eg:prop rdf:dataType="&rdf;XMLLiteral">&lt;br>&lt;/br></eg:prop>
is legal) should be added. As I have explained in
http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003Jul/0410.html,
I would prefer it to make rdf:dataType="&rdf;XMLLiteral" in
the RDF/XML syntax illegal, to make things easier for the
parser.

In case the third solution is taken, namely that
<eg:prop rdf:dataType="&rdf;XMLLiteral">&lt;br/></eg:prop>
is legal, that would mean that for XML Literal datatypes,
there is a strange special case in that they are the only
case where the straightforward rdf:dataType notation allows
more than the values in the lexical space.


The third issue, C), is about context information for
rdf:parseType="Literal". The following two test documents
illustrate the situation:

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
          xmlns:eg="http://example.org/"
          xmlns:eg2="http://example.com/">
  <rdf:Description rdf:about="http://example.org/foo">
    <eg:bar rdf:parseType="Literal"><eg:br/></eg:bar>
  </rdf:Description>
</rdf:RDF>

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
          xmlns:eg="http://example.org/">
  <rdf:Description rdf:about="http://example.org/foo">
    <eg:bar rdf:parseType="Literal"><eg2:br 
xmlns:eg2="http://example.com/"></eg2:br></eg:bar>
  </rdf:Description>
</rdf:RDF>

My reading of the current spec is that both examples produce
the same graph, and that the canonicalization (and therefore,
according to the discussion above, the literal value) of
the literal in the graph is:

"<eg2:br xmlns:eg2="http://example.com/"></eg2:br>"

If this is not true, please tell me what happens in the
above case.

This example shows that while in the literal value
(based on canonicalization), the context (in particular
namespace declarations) is internalized as described by
Pat, in the RDF/XML syntax, this does not have to be
the case.


Regards,    Martin.

Received on Sunday, 3 August 2003 18:24:01 UTC