- From: Dave Beckett <dave.beckett@bristol.ac.uk>
- Date: Mon, 4 Aug 2003 17:49:56 +0100
- To: Martin Duerst <duerst@w3.org>
- Cc: www-rdf-comments@w3.org, pat hayes <phayes@ihmc.us>, Benja Fallenstein <b.fallenstein@gmx.de>, Jeremy Carroll <jjc@hplb.hpl.hp.com>, w3c-rdfcore-wg@w3.org, w3c-i18n-ig@w3.org, msm@w3.org
On Mon, 04 Aug 2003 10:55:01 -0400 Martin Duerst <duerst@w3.org> wrote: > Hello Dave, > > Many thanks for your quick and detailed reply! Let me cut this down a bit then :) <snip/> > Okay. I was trying to ask this because I assume that in all > cases except XML Literals, the syntax allowed in RDF/XML is > that defined by the lexical space of the datatype (modulo > XML character escaping). Is this the case? In RDF/XML, the lexical space that you can write into XML is constrained by XML's alphabet - a subset of Unicode defined in the particular XML specification being used. The lexical space of RDF literals (including the datatyped literals) is a Unicode string (sequence of Unicode characters). I think we've worked out that these are not the same - some characters in a Unicode string cannot be writte in XML. So, RDF/XML doesn't define it - either the XML specs do, or the rdf abstract syntax does (defn of literals). <snip/> > >The "content of an element" is not in the graph (there are no elements > >in the abstract syntax) and is not the lexical form > > I now understand that for XML Literals. What about all the other > datatype literals? Same thing. For example, the XSD integer 2 is not in the graph either - RDF doesn't have such integers in its abstract syntax. So the XSD:int rules are used to encode that datatype integer as a Unicode string (I hope, or I'm lost). In the RDF/XML, that Unicode string lexical form turns into a sequence of Unicode characters (character InfoItems). These infoset items are written in XML as character data, in some content encoding. <snip/> > >This particular part of <br/> exc-canonicalizing to octets equivalent to > >the Unicode "<br></br>" doesn't happen to be tested in our test cases, > >but we are not providing an exc-C14N test suite. I can add it. > > I agree that it would be a bad idea to try to provide an exc-C14N > test suite. I think it would be good to add an example like this > just to document how RDF/XML syntax, lexical value, and so on, > are related, and in particular, that they are not exactly the same. OK, noted. > >They are the same triple. XML Canonicalization happens in mapping from > >the concrete syntax to the abstract. > > > >So that means there is no problem with A). > > Very good, many thanks for the confirmation. <snip/> > > > Now to B) <snip/> > >illegal is vague. It is legal XML, legal RDF/XML. However > >in the graph it might be an ill-formed XML literal (PatH will > >have the right term). > > Okay. Are there examples of 'ill-formed' other literals in the > test suite? If yes, it may be appropriate to add this one. Yes. We have tests such as "010" xsd:int as a bad datatyped literal. The phrase we are using is ill-typed, at which point the interpretation in the semantics is different. See near (editor's draft, take care) http://www.w3.org/2001/sw/RDFCore/TR/WD-rdf-mt-20030117/#illformedliteral There are several tests below http://www.w3.org/2000/10/rdf-tests/rdfcore/datatypes/ but one is: "With appropriate datatype knowledge, a 'badly-formed' datatyped literal can be detected." http://www.w3.org/2000/10/rdf-tests/rdfcore/datatypes/Manifest.rdf#non-well-formed-literal-2 which checks that a bad integer "flargh" http://www.w3.org/2000/10/rdf-tests/rdfcore/datatypes/test002.nt does not conclude that it is an RDF datatype http://www.w3.org/2000/10/rdf-tests/rdfcore/datatypes/test002b.nt These are not required tests; only if the particular datatype (in this case XSD) is supported by the application. <snip/> > >It might make sense to forbid rdf:datatype with the URI of rdf:XMLLiteral > >for the reason you give - to make things easier for the parser. Do > >you feel it makes things easier for the user too? > > Here is some thoughts I have gone through: > - It makes things somewhat different for software writing RDF/XML: > It can't just write out all types with rdf:datatype. But this is > probably a desirable effect. > - For users, there are really a lot of users out there, and it's > not very easy to say in general. But in my view, it very much helps > them understanding XML Literals if they see these literals always > at the same level of escaping. Most people seem to get confused > very quickly with different escaping levels. Using only > rdf:parseType='Literal' would mean that the basic escaping is > the same in the RDF/XML syntax, in the abstract syntax, and, > as far as I understand, in most implementations. I think > that is a serious benefit. > > > >If we do ban it, that would mean no problem with B), yes? > > Yes, that means that problem B is gone. Your summary of user issues there seems appropriate. Encoded XML does look ugly too! > > > The third issue, C), is about context information for > > > rdf:parseType="Literal". The following two test documents > > > illustrate the situation: > > > >What is context? > > Sorry to not be clear enough. By context, I meant everything > outside the actual element content that represents the literal > value. In particular the xmlns:eg2="http://example.com/" > prefix declaration in the first example. > > > > > <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" > > > xmlns:eg="http://example.org/" > > > xmlns:eg2="http://example.com/"> > > > <rdf:Description rdf:about="http://example.org/foo"> > > > <eg:bar rdf:parseType="Literal"><eg:br/></eg:bar> > > > </rdf:Description> > > > </rdf:RDF> > > > > > > <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" > > > xmlns:eg="http://example.org/"> > > > <rdf:Description rdf:about="http://example.org/foo"> > > > <eg:bar rdf:parseType="Literal"><eg2:br > > > xmlns:eg2="http://example.com/"></eg2:br></eg:bar> > > > </rdf:Description> > > > </rdf:RDF> > > > > > > My reading of the current spec is that both examples produce > > > the same graph, and that the canonicalization (and therefore, > > > according to the discussion above, the literal value) of > > > the literal in the graph is: > > > > > > "<eg2:br xmlns:eg2="http://example.com/"></eg2:br>" > > > > > > If this is not true, please tell me what happens in the > > > above case. > > > >The whtespace is different in your examples and is significant. > >Assuming that is a mistake, then apart from that, both lexical values > >re as given above. > > There is a linebreak instead of a space after <eg2:br in the > second <rdf:RDF> piece. Is that what you meant? This was introduced > by my mailer, which doesn't like long lines. Canonicalization should > turn that back into a space again. Whitespace is significant in > element content, for good reasons, but not inside start tags. Oops, my mistake, yes I agree that is not a significant space and the C14N will do as you say. > My understanding is that the XML Literal in both cases will > come out as: > > "<eg2:br xmlns:eg2="http://example.com/"></eg2:br>"^^ > <http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral> > > (I have added a linebreak after ^^ just to make sure that > no other ones get added) > > You seem to agree. Actually no. Since both use different namespace prefixes and I hadn't noticed this the first time. Apart from that they will be the same. Did you mean to move the namespace declaration and change the name of the element? > > > This example shows that while in the literal value > > > (based on canonicalization), the context (in particular > > > namespace declarations) is internalized as described by > > > Pat, in the RDF/XML syntax, this does not have to be > > > the case. > > > >I don't understand this point or see what the problem is here. > >What document must we change to fix it? > > My guess is that currently, no document needs to change. > But I wanted to make sure this was the case, and there were > no misunderstandings about canonicalization and context > (i.e. in an RDF/XML context, namespace prefix declarations > could be far away from the actual literals where they apply. > Once canonicalized, that's no longer the case. I'm not sure if there is an issue there since if the namespace prefixes are intended to be different - and exc-C14N doesn't rename prefixes - the lexical forms will be different. Dave
Received on Monday, 4 August 2003 12:50:23 UTC