Re: Test cases: XML Literal value space and exclusive canonicalization from Dave Beckett on 2003-08-04 (w3c-rdfcore-wg@w3.org from August 2003)

From: Dave Beckett <dave.beckett@bristol.ac.uk>
Date: Mon, 4 Aug 2003 17:49:56 +0100
To: Martin Duerst <duerst@w3.org>
Cc: www-rdf-comments@w3.org, pat hayes <phayes@ihmc.us>, Benja Fallenstein <b.fallenstein@gmx.de>, Jeremy Carroll <jjc@hplb.hpl.hp.com>, w3c-rdfcore-wg@w3.org, w3c-i18n-ig@w3.org, msm@w3.org
Message-Id: <20030804174956.74032aff.dave.beckett@bristol.ac.uk>
On Mon, 04 Aug 2003 10:55:01 -0400
Martin Duerst <duerst@w3.org> wrote:

> Hello Dave,
> 
> Many thanks for your quick and detailed reply!

Let me cut this down a bit  then :)

<snip/>
> Okay. I was trying to ask this because I assume that in all
> cases except XML Literals, the syntax allowed in RDF/XML is
> that defined by the lexical space of the datatype (modulo
> XML character escaping). Is this the case?

In RDF/XML, the lexical space that you can write into XML is constrained
by XML's alphabet - a subset of Unicode defined in the particular XML
specification being used.

The lexical space of RDF literals (including the datatyped literals)
is a Unicode string (sequence of Unicode characters).

I think we've worked out that these are not the same - some characters
in a  Unicode string cannot be writte in XML.

So, RDF/XML doesn't define it - either the XML specs
do, or the rdf abstract syntax does (defn of literals).

<snip/>

> >The "content of an element" is not in the graph (there are no elements
> >in the abstract syntax) and is not the lexical form
> 
> I now understand that for XML Literals. What about all the other
> datatype literals?

Same thing.  For example, the XSD integer 2 is not in the graph either -
RDF doesn't have such integers in its abstract syntax.  So the XSD:int
rules are used to encode that datatype integer as a Unicode string (I
hope, or I'm lost).

In the RDF/XML, that Unicode string lexical form turns into a sequence
of Unicode characters (character InfoItems). These infoset items are
written in XML as character data, in some content encoding.

<snip/>

> >This particular part of <br/> exc-canonicalizing to octets equivalent to
> >the Unicode "<br></br>" doesn't happen to be tested in our test cases,
> >but we are not providing an exc-C14N test suite.  I can add it.
> 
> I agree that it would be a bad idea to try to provide an exc-C14N
> test suite. I think it would be good to add an example like this
> just to document how RDF/XML syntax, lexical value, and so on,
> are related, and in particular, that they are not exactly the same.

OK, noted.

> >They are the same triple.  XML Canonicalization happens in mapping from
> >the concrete syntax to the abstract.
> >
> >So that means there is no problem with A).
> 
> Very good, many thanks for the confirmation.

<snip/>

> > > Now to B)
<snip/>
> >illegal is vague.  It is legal XML, legal RDF/XML.  However
> >in the graph it might be an ill-formed XML literal (PatH will
> >have the right term).
> 
> Okay. Are there examples of 'ill-formed' other literals in the
> test suite? If yes, it may be appropriate to add this one.

Yes.  We have tests such as "010" xsd:int as a bad datatyped literal.
The phrase we are using is ill-typed, at which point the interpretation
in the semantics is different.

See near (editor's draft, take care)
  http://www.w3.org/2001/sw/RDFCore/TR/WD-rdf-mt-20030117/#illformedliteral

There are several tests below http://www.w3.org/2000/10/rdf-tests/rdfcore/datatypes/
but one is:
  "With appropriate datatype knowledge, a 'badly-formed' datatyped literal can be detected."
  http://www.w3.org/2000/10/rdf-tests/rdfcore/datatypes/Manifest.rdf#non-well-formed-literal-2
which checks that a bad integer "flargh"
  http://www.w3.org/2000/10/rdf-tests/rdfcore/datatypes/test002.nt
does not conclude that it is an RDF datatype
  http://www.w3.org/2000/10/rdf-tests/rdfcore/datatypes/test002b.nt

These are not required tests; only if the particular datatype (in this case XSD)
is supported by the application.


<snip/>
> >It might make sense to forbid rdf:datatype with the URI of rdf:XMLLiteral
> >for the reason you give - to make things easier for the parser.  Do
> >you feel it makes things easier for the user too?
> 
> Here is some thoughts I have gone through:
> - It makes things somewhat different for software writing RDF/XML:
>    It can't just write out all types with rdf:datatype. But this is
>    probably a desirable effect.
> - For users, there are really a lot of users out there, and it's
>    not very easy to say in general. But in my view, it very much helps
>    them understanding XML Literals if they see these literals always
>    at the same level of escaping. Most people seem to get confused
>    very quickly with different escaping levels. Using only
>    rdf:parseType='Literal' would mean that the basic escaping is
>    the same in the RDF/XML syntax, in the abstract syntax, and,
>    as far as I understand, in most implementations. I think
>    that is a serious benefit.
> 
> 
> >If we do ban it, that would mean no problem with B), yes?
> 
> Yes, that means that problem B is gone.

Your summary of user issues there seems appropriate.  Encoded XML does
look ugly too!


> > > The third issue, C), is about context information for
> > > rdf:parseType="Literal". The following two test documents
> > > illustrate the situation:
> >
> >What is context?
> 
> Sorry to not be clear enough. By context, I meant everything
> outside the actual element content that represents the literal
> value. In particular the xmlns:eg2="http://example.com/"
> prefix declaration in the first example.
> 
> 
> > > <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
> > >           xmlns:eg="http://example.org/"
> > >           xmlns:eg2="http://example.com/">
> > >   <rdf:Description rdf:about="http://example.org/foo">
> > >     <eg:bar rdf:parseType="Literal"><eg:br/></eg:bar>
> > >   </rdf:Description>
> > > </rdf:RDF>
> > >
> > > <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
> > >           xmlns:eg="http://example.org/">
> > >   <rdf:Description rdf:about="http://example.org/foo">
> > >     <eg:bar rdf:parseType="Literal"><eg2:br
> > > xmlns:eg2="http://example.com/"></eg2:br></eg:bar>
> > >   </rdf:Description>
> > > </rdf:RDF>
> > >
> > > My reading of the current spec is that both examples produce
> > > the same graph, and that the canonicalization (and therefore,
> > > according to the discussion above, the literal value) of
> > > the literal in the graph is:
> > >
> > > "<eg2:br xmlns:eg2="http://example.com/"></eg2:br>"
> > >
> > > If this is not true, please tell me what happens in the
> > > above case.
> >
> >The whtespace is different in your examples and is significant.
> >Assuming that is a mistake, then apart from that, both lexical values
> >re as given above.
> 
> There is a linebreak instead of a space after <eg2:br in the
> second <rdf:RDF> piece. Is that what you meant? This was introduced
> by my mailer, which doesn't like long lines. Canonicalization should
> turn that back into a space again. Whitespace is significant in
> element content, for good reasons, but not inside start tags.

Oops, my mistake, yes I agree that is not a significant space
and the C14N will do as you say.

> My understanding is that the XML Literal in both cases will
> come out as:
> 
> "<eg2:br xmlns:eg2="http://example.com/"></eg2:br>"^^
> <http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral>
> 
> (I have added a linebreak after ^^ just to make sure that
> no other ones get added)
> 
> You seem to agree.

Actually no.  Since both use different namespace prefixes
and I hadn't noticed this the first time.  Apart from that
they will be the same.  Did you mean to move the
namespace declaration and change the name of the element?

> > > This example shows that while in the literal value
> > > (based on canonicalization), the context (in particular
> > > namespace declarations) is internalized as described by
> > > Pat, in the RDF/XML syntax, this does not have to be
> > > the case.
> >
> >I don't understand this point or see what the problem is here.
> >What document must we change to fix it?
> 
> My guess is that currently, no document needs to change.
> But I wanted to make sure this was the case, and there were
> no misunderstandings about canonicalization and context
> (i.e. in an RDF/XML context, namespace prefix declarations
> could be far away from the actual literals where they apply.
> Once canonicalized, that's no longer the case.

I'm not sure if there is an issue there since if the namespace
prefixes are intended to be different - and exc-C14N doesn't
rename prefixes - the lexical forms will be different.

Dave
Received on Monday, 4 August 2003 12:50:23 UTC