Re: Test cases: XML Literal value space and exclusive canonicalization from Martin Duerst on 2003-08-04 (www-rdf-comments@w3.org from July to September 2003)

From: Martin Duerst <duerst@w3.org>
Date: Mon, 04 Aug 2003 13:55:25 -0400
To: Dave Beckett <dave.beckett@bristol.ac.uk>
Cc: www-rdf-comments@w3.org, pat hayes <phayes@ihmc.us>, Benja Fallenstein <b.fallenstein@gmx.de>, Jeremy Carroll <jjc@hplb.hpl.hp.com>, w3c-rdfcore-wg@w3.org, w3c-i18n-ig@w3.org, msm@w3.org
Message-Id: <4.2.0.58.J.20030804133907.00a92140@localhost>
Hello Dave,

again, thanks very much for your quick reply!

At 17:49 03/08/04 +0100, Dave Beckett wrote:

>On Mon, 04 Aug 2003 10:55:01 -0400
>Martin Duerst <duerst@w3.org> wrote:

> > Okay. I was trying to ask this because I assume that in all
> > cases except XML Literals, the syntax allowed in RDF/XML is
> > that defined by the lexical space of the datatype (modulo
> > XML character escaping). Is this the case?
>
>In RDF/XML, the lexical space that you can write into XML is constrained
>by XML's alphabet - a subset of Unicode defined in the particular XML
>specification being used.
>
>The lexical space of RDF literals (including the datatyped literals)
>is a Unicode string (sequence of Unicode characters).
>
>I think we've worked out that these are not the same - some characters
>in a  Unicode string cannot be writte in XML.

Yes, in particular most C0 control characters, in XML 1.0.
Most of that will be changed in XML 1.1. The NULL character
(U+0000) still isn't legal XML 1.1, as far as I know.


> > >The "content of an element" is not in the graph (there are no elements
> > >in the abstract syntax) and is not the lexical form
> >
> > I now understand that for XML Literals. What about all the other
> > datatype literals?
>
>Same thing.  For example, the XSD integer 2 is not in the graph either -
>RDF doesn't have such integers in its abstract syntax.  So the XSD:int
>rules are used to encode that datatype integer as a Unicode string (I
>hope, or I'm lost).
>
>In the RDF/XML, that Unicode string lexical form turns into a sequence
>of Unicode characters (character InfoItems). These infoset items are
>written in XML as character data, in some content encoding.

So it seems to be true, for simple datatype literals, that

Abstract syntax == lexical value ~= representation in RDF/XML

(the ~= including generic issues such as character escaping,
but no datatype-specific issues).


> > I agree that it would be a bad idea to try to provide an exc-C14N
> > test suite. I think it would be good to add an example like this
> > just to document how RDF/XML syntax, lexical value, and so on,
> > are related, and in particular, that they are not exactly the same.
>
>OK, noted.

Thanks.


> > > > Now to B)
><snip/>
> > >illegal is vague.  It is legal XML, legal RDF/XML.  However
> > >in the graph it might be an ill-formed XML literal (PatH will
> > >have the right term).
> >
> > Okay. Are there examples of 'ill-formed' other literals in the
> > test suite? If yes, it may be appropriate to add this one.
>
>Yes.  We have tests such as "010" xsd:int as a bad datatyped literal.
>The phrase we are using is ill-typed, at which point the interpretation
>in the semantics is different.
>
>See near (editor's draft, take care)
>   http://www.w3.org/2001/sw/RDFCore/TR/WD-rdf-mt-20030117/#illformedliteral
>
>There are several tests below 
>http://www.w3.org/2000/10/rdf-tests/rdfcore/datatypes/
>but one is:
>   "With appropriate datatype knowledge, a 'badly-formed' datatyped 
> literal can be detected."
> 
>http://www.w3.org/2000/10/rdf-tests/rdfcore/datatypes/Manifest.rdf#non-well 
>-formed-literal-2
>which checks that a bad integer "flargh"
>   http://www.w3.org/2000/10/rdf-tests/rdfcore/datatypes/test002.nt
>does not conclude that it is an RDF datatype
>   http://www.w3.org/2000/10/rdf-tests/rdfcore/datatypes/test002b.nt
>
>These are not required tests; only if the particular datatype (in this 
>case XSD)
>is supported by the application.

Is every RDF application required to support XML Literals?
Or only the syntactic parts, and the RDF/XML to Graph conversion
(including canonicalization) if appropriate?





> > > > The third issue, C), is about context information for
> > > > rdf:parseType="Literal". The following two test documents
> > > > illustrate the situation:
> > >
> > >What is context?
> >
> > Sorry to not be clear enough. By context, I meant everything
> > outside the actual element content that represents the literal
> > value. In particular the xmlns:eg2="http://example.com/"
> > prefix declaration in the first example.
> >
> >
> > > > <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
> > > >           xmlns:eg="http://example.org/"
> > > >           xmlns:eg2="http://example.com/">
> > > >   <rdf:Description rdf:about="http://example.org/foo">
> > > >     <eg:bar rdf:parseType="Literal"><eg:br/></eg:bar>
> > > >   </rdf:Description>
> > > > </rdf:RDF>

Sorry, my mistake. This should read:

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
           xmlns:eg="http://example.org/"
           xmlns:eg2="http://example.com/">
   <rdf:Description rdf:about="http://example.org/foo">
     <eg:bar rdf:parseType="Literal"><eg2:br/></eg:bar>
   </rdf:Description>
</rdf:RDF>

(i.e. with <eg2:br/> instead of <eg:br/>)


> > > > <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
> > > >           xmlns:eg="http://example.org/">
> > > >   <rdf:Description rdf:about="http://example.org/foo">
> > > >     <eg:bar rdf:parseType="Literal"><eg2:br
> > > > xmlns:eg2="http://example.com/"></eg2:br></eg:bar>
> > > >   </rdf:Description>
> > > > </rdf:RDF>
> > > >
> > > > My reading of the current spec is that both examples produce
> > > > the same graph, and that the canonicalization (and therefore,
> > > > according to the discussion above, the literal value) of
> > > > the literal in the graph is:
> > > >
> > > > "<eg2:br xmlns:eg2="http://example.com/"></eg2:br>"
> > > >
> > > > If this is not true, please tell me what happens in the
> > > > above case.

> > My understanding is that the XML Literal in both cases will
> > come out as:
> >
> > "<eg2:br xmlns:eg2="http://example.com/"></eg2:br>"^^
> > <http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral>
> >
> > (I have added a linebreak after ^^ just to make sure that
> > no other ones get added)
> >
> > You seem to agree.
>
>Actually no.  Since both use different namespace prefixes
>and I hadn't noticed this the first time.

Sorry, my mistake. The intention was to have them be the
same, fixed as per above.


>Apart from that
>they will be the same.

Okay, then with the corrected example, they will be the same, yes?


> > >I don't understand this point or see what the problem is here.
> > >What document must we change to fix it?
> >
> > My guess is that currently, no document needs to change.
> > But I wanted to make sure this was the case, and there were
> > no misunderstandings about canonicalization and context
> > (i.e. in an RDF/XML context, namespace prefix declarations
> > could be far away from the actual literals where they apply.
> > Once canonicalized, that's no longer the case.



Regards,    Martin.
Received on Monday, 4 August 2003 14:20:33 UTC