Re: Test cases: XML Literal value space and exclusive canonicalization from Dave Beckett on 2003-08-04 (w3c-rdfcore-wg@w3.org from August 2003)

From: Dave Beckett <dave.beckett@bristol.ac.uk>
Date: Mon, 4 Aug 2003 12:12:41 +0100
To: Martin Duerst <duerst@w3.org>
Cc: www-rdf-comments@w3.org, pat hayes <phayes@ihmc.us>, Benja Fallenstein <b.fallenstein@gmx.de>, Jeremy Carroll <jjc@hplb.hpl.hp.com>, w3c-rdfcore-wg@w3.org, w3c-i18n-ig@w3.org, msm@w3.org
Message-Id: <20030804121241.022ebc44.dave.beckett@bristol.ac.uk>
On Sun, 03 Aug 2003 17:36:46 -0400
Martin Duerst <duerst@w3.org> wrote:

> This message is prompted by some details in the recent discussion
> about XML Literals between Pat Hayes and Benja Fallenstein.
> I have tried to express this as much as possible as test cases.
> 
> 
> There are two somewhat related issues:
> A) Lexical space of XML Literals vs. allowed syntax in elements
>     with rdf:parseType="Literal".
> B) Allowed syntax with rdf:dataType="&rdf;XMLLiteral"
> C) Context information for rdf:parseType="Literal"
> 
> First to A):
> 
> Two recent messages from Pat Hayes say that the lexical space
> of XML Literals and the value space is in 1:1 correspondence:
> 
> http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003Aug/0026.html
>  >>>>
> "Note that the XML values of well-typed XML literals are in precise
> 1:1 correspondence with the XML literal strings of such literals, but
> are not themselves character strings."
>  >>>>
> 
> http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003Jul/0452.html
>  >>>>
> The lexical-to-value mapping is a 1:1 mapping from the lexical space
> onto the value space. The value of the lexical-to-value mapping
>  >>>>

Those are about questions in the RDF graph.

The RDF graph is an abstract syntax of triples, and is separate
from the RDF/XML syntax which is the concrete one.

> This lets me ask the following test-based questions:

Which you do, in the RDF/XML syntax, however we
usually pose questions about defails of the graph in our
test format for the graph, N-Triples

> Do the following two RDF/XML documents entail the same graph?
> 
> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>           xmlns:eg="http://example.org/">
>   <rdf:Description rdf:about="http://example.org/foo">
>     <eg:bar rdf:parseType="Literal"><br/></eg:bar>
>   </rdf:Description>
> </rdf:RDF>

In the graph
  <http://example.org/foo> <http://example.org/bar> "<br></br>"^^<http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral> .

So you can see that the RDF/XML Syntax spec gives you the "<br></br>"
canonicalized XML unicode string (yes, I know. Leave that issue for now).

> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>           xmlns:eg="http://example.org/">
>   <rdf:Description rdf:about="http://example.org/foo">
>     <eg:bar rdf:parseType="Literal"><br></br></eg:bar>
>   </rdf:Description>
> </rdf:RDF>

  <http://example.org/foo> <http://example.org/bar> "<br></br>"^^<http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral> .

Same triple.

> The reason why I ask this is that in the first one, "<br/>" is
> used, which is not canonical. If the content of an element
> marked with rdf:parseType="Literal" has to be the lexical
> value of of the XML Literal datatype, and the lexical value
> is in 1:1 correspondence with the (canonical) value space,
> then the first example would be illegal. Please confirm that
> the first example is legal, and that the two examples give
> the same graph.

The "content of an element" is not in the graph (there are no elements
in the abstract syntax) and is not the lexical form (the concrete syntax
has no lexical forms either, they are in the graph).

Both RDF/XML examples are legal and give the same graph.

> Also, please clarify, wherever necessary in the specs, that
> the content of an element marked with rdf:parseType="Literal"
> is not the literal value of the XML Literal, and make sure
> that this is covered by an appropriate test case.

Given that exc-C14N produces octets and we wanted a unicode string, I am
rewriting that part of the RDF/XML syntax document.

This particular part of <br/> exc-canonicalizing to octets equivalent to
the Unicode "<br></br>" doesn't happen to be tested in our test cases,
but we are not providing an exc-C14N test suite.  I can add it.

> In case the first one should not be allowed, this creates
> an internationalization problem, because it would be impossible
> to encode an RDF/XML document with <?xml version='1.0' encoding='us-ascii'?>
> and still include characters outside US-ASCII (with numeric
> character references), because numeric character references
> for the most part are not allowed in the canonicalization.

They are the same triple.  XML Canonicalization happens in mapping from
the concrete syntax to the abstract.
 
So that means there is no problem with A).


> Now to B)
> 
> In an earlier mail
> (http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003Jul/0410.html),
> I asked about the case of:
> 
> <rdf:Description>
>    <eg:prop rdf:parseType="Literal"><em>foo</em></eg:prop>
>    <eg:prop rdf:dataType="&rdf;XMLLiteral">&lt;em>foo&lt;/em></eg:prop>

(Aside: here and below, rdf:datatype is the correct term)

> </rdf:Description>
> 
> (for which Jeremy says that this results in one single triple).

Yes, results in the same triple in the graph.

> Now let's change this to:
> 
> <rdf:Description>
>    <eg:prop rdf:parseType="Literal"><br/></eg:prop>
>    <eg:prop rdf:dataType="&rdf;XMLLiteral">&lt;br/></eg:prop>
> </rdf:Description>

Produces different triples in the graph, the lexical forms of them are
the Unicode strings:
  "<br></br>"
  "<br/>"

> Given the discussion under A), it seems to me that the most
> plausible result of this is that the first line produces a
> triple, but the second line is illegal, because the string
> "<br/>" isn't cannonicalized ...

illegal is vague.  It is legal XML, legal RDF/XML.  However
in the graph it might be an ill-formed XML literal (PatH will
have the right term).

> .... So the correct case that leads
> to a single triple would be:
> 
> <rdf:Description>
>    <eg:prop rdf:parseType="Literal"><br/></eg:prop>
>    <eg:prop rdf:dataType="&rdf;XMLLiteral">&lt;br>&lt;/br></eg:prop>
> </rdf:Description>

Produces the same triples in the graph, the lexical forms of them are
the Unicode strings:
  "<br></br>"
  "<br></br>"

> 
> If this is the correct interpretation, then a test case
> making <eg:prop rdf:dataType="&rdf;XMLLiteral">&lt;br/></eg:prop>
> illegal (and another showing that
> <eg:prop rdf:dataType="&rdf;XMLLiteral">&lt;br>&lt;/br></eg:prop>
> is legal) should be added. As I have explained in
> http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003Jul/0410.html,
> I would prefer it to make rdf:dataType="&rdf;XMLLiteral" in
> the RDF/XML syntax illegal, to make things easier for the
> parser.

We didn't want to require people to have an XML parser
for handling RDF's abstract syntax so all the XML checking
belongs in the mapping from RDF/XML to the triples.

It might make sense to forbid rdf:datatype with the URI of rdf:XMLLiteral
for the reason you give - to make things easier for the parser.  Do
you feel it makes things easier for the user too?

If we do ban it, that would mean no problem with B), yes?


> In case the third solution is taken, namely that
> <eg:prop rdf:dataType="&rdf;XMLLiteral">&lt;br/></eg:prop>
> is legal, that would mean that for XML Literal datatypes,
> there is a strange special case in that they are the only
> case where the straightforward rdf:dataType notation allows
> more than the values in the lexical space.
> 
> 
> The third issue, C), is about context information for
> rdf:parseType="Literal". The following two test documents
> illustrate the situation:

What is context?

> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>           xmlns:eg="http://example.org/"
>           xmlns:eg2="http://example.com/">
>   <rdf:Description rdf:about="http://example.org/foo">
>     <eg:bar rdf:parseType="Literal"><eg:br/></eg:bar>
>   </rdf:Description>
> </rdf:RDF>
> 
> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>           xmlns:eg="http://example.org/">
>   <rdf:Description rdf:about="http://example.org/foo">
>     <eg:bar rdf:parseType="Literal"><eg2:br 
> xmlns:eg2="http://example.com/"></eg2:br></eg:bar>
>   </rdf:Description>
> </rdf:RDF>
> 
> My reading of the current spec is that both examples produce
> the same graph, and that the canonicalization (and therefore,
> according to the discussion above, the literal value) of
> the literal in the graph is:
> 
> "<eg2:br xmlns:eg2="http://example.com/"></eg2:br>"
> 
> If this is not true, please tell me what happens in the
> above case.

The whtespace is different in your examples and is significant. 
Assuming that is a mistake, then apart from that, both lexical values 
re as given above.

> This example shows that while in the literal value
> (based on canonicalization), the context (in particular
> namespace declarations) is internalized as described by
> Pat, in the RDF/XML syntax, this does not have to be
> the case.

I don't understand this point or see what the problem is here.
What document must we change to fix it?

Dave
Received on Monday, 4 August 2003 07:14:59 UTC