- From: Martin Duerst <duerst@w3.org>
- Date: Sun, 03 Aug 2003 17:36:46 -0400
- To: www-rdf-comments@w3.org, pat hayes <phayes@ihmc.us>, Benja Fallenstein <b.fallenstein@gmx.de>, Jeremy Carroll <jjc@hplb.hpl.hp.com>
- Cc: w3c-rdfcore-wg@w3.org, w3c-i18n-ig@w3.org, msm@w3.org
This message is prompted by some details in the recent discussion about XML Literals between Pat Hayes and Benja Fallenstein. I have tried to express this as much as possible as test cases. There are two somewhat related issues: A) Lexical space of XML Literals vs. allowed syntax in elements with rdf:parseType="Literal". B) Allowed syntax with rdf:dataType="&rdf;XMLLiteral" C) Context information for rdf:parseType="Literal" First to A): Two recent messages from Pat Hayes say that the lexical space of XML Literals and the value space is in 1:1 correspondence: http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003Aug/0026.html >>>> "Note that the XML values of well-typed XML literals are in precise 1:1 correspondence with the XML literal strings of such literals, but are not themselves character strings." >>>> http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003Jul/0452.html >>>> The lexical-to-value mapping is a 1:1 mapping from the lexical space onto the value space. The value of the lexical-to-value mapping >>>> This lets me ask the following test-based questions: Do the following two RDF/XML documents entail the same graph? <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:eg="http://example.org/"> <rdf:Description rdf:about="http://example.org/foo"> <eg:bar rdf:parseType="Literal"><br/></eg:bar> </rdf:Description> </rdf:RDF> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:eg="http://example.org/"> <rdf:Description rdf:about="http://example.org/foo"> <eg:bar rdf:parseType="Literal"><br></br></eg:bar> </rdf:Description> </rdf:RDF> The reason why I ask this is that in the first one, "<br/>" is used, which is not canonical. If the content of an element marked with rdf:parseType="Literal" has to be the lexical value of of the XML Literal datatype, and the lexical value is in 1:1 correspondence with the (canonical) value space, then the first example would be illegal. Please confirm that the first example is legal, and that the two examples give the same graph. Also, please clarify, wherever necessary in the specs, that the content of an element marked with rdf:parseType="Literal" is not the literal value of the XML Literal, and make sure that this is covered by an appropriate test case. In case the first one should not be allowed, this creates an internationalization problem, because it would be impossible to encode an RDF/XML document with <?xml version='1.0' encoding='us-ascii'?> and still include characters outside US-ASCII (with numeric character references), because numeric character references for the most part are not allowed in the canonicalization. Now to B) In an earlier mail (http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003Jul/0410.html), I asked about the case of: <rdf:Description> <eg:prop rdf:parseType="Literal"><em>foo</em></eg:prop> <eg:prop rdf:dataType="&rdf;XMLLiteral"><em>foo</em></eg:prop> </rdf:Description> (for which Jeremy says that this results in one single triple). Now let's change this to: <rdf:Description> <eg:prop rdf:parseType="Literal"><br/></eg:prop> <eg:prop rdf:dataType="&rdf;XMLLiteral"><br/></eg:prop> </rdf:Description> Given the discussion under A), it seems to me that the most plausible result of this is that the first line produces a triple, but the second line is illegal, because the string "<br/>" isn't cannonicalized. So the correct case that leads to a single triple would be: <rdf:Description> <eg:prop rdf:parseType="Literal"><br/></eg:prop> <eg:prop rdf:dataType="&rdf;XMLLiteral"><br></br></eg:prop> </rdf:Description> If this is the correct interpretation, then a test case making <eg:prop rdf:dataType="&rdf;XMLLiteral"><br/></eg:prop> illegal (and another showing that <eg:prop rdf:dataType="&rdf;XMLLiteral"><br></br></eg:prop> is legal) should be added. As I have explained in http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003Jul/0410.html, I would prefer it to make rdf:dataType="&rdf;XMLLiteral" in the RDF/XML syntax illegal, to make things easier for the parser. In case the third solution is taken, namely that <eg:prop rdf:dataType="&rdf;XMLLiteral"><br/></eg:prop> is legal, that would mean that for XML Literal datatypes, there is a strange special case in that they are the only case where the straightforward rdf:dataType notation allows more than the values in the lexical space. The third issue, C), is about context information for rdf:parseType="Literal". The following two test documents illustrate the situation: <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:eg="http://example.org/" xmlns:eg2="http://example.com/"> <rdf:Description rdf:about="http://example.org/foo"> <eg:bar rdf:parseType="Literal"><eg:br/></eg:bar> </rdf:Description> </rdf:RDF> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:eg="http://example.org/"> <rdf:Description rdf:about="http://example.org/foo"> <eg:bar rdf:parseType="Literal"><eg2:br xmlns:eg2="http://example.com/"></eg2:br></eg:bar> </rdf:Description> </rdf:RDF> My reading of the current spec is that both examples produce the same graph, and that the canonicalization (and therefore, according to the discussion above, the literal value) of the literal in the graph is: "<eg2:br xmlns:eg2="http://example.com/"></eg2:br>" If this is not true, please tell me what happens in the above case. This example shows that while in the literal value (based on canonicalization), the context (in particular namespace declarations) is internalized as described by Pat, in the RDF/XML syntax, this does not have to be the case. Regards, Martin.
Received on Sunday, 3 August 2003 18:24:01 UTC