RE: Change in definition of RDF literals from Patrick.Stickler@nokia.com on 2003-05-28 (w3c-rdfcore-wg@w3.org from May 2003)

From: <Patrick.Stickler@nokia.com>
Date: Wed, 28 May 2003 11:05:52 +0300
To: <duerst@w3.org>, <gk@ninebynine.org>, <jjc@hplb.hpl.hp.com>
Cc: <w3c-rdfcore-wg@w3.org>, <w3c-i18n-ig@w3.org>
Message-ID: <A03E60B17132A84F9B4BB5EEDE57957B5FBBE0@trebe006.europe.nokia.com>
> -----Original Message-----
> From: ext Martin Duerst [mailto:duerst@w3.org]
> Sent: 27 May, 2003 18:22
> To: Stickler Patrick (NMP/Tampere); gk@ninebynine.org;
> jjc@hplb.hpl.hp.com
> Cc: w3c-rdfcore-wg@w3.org; w3c-i18n-ig@w3.org
> Subject: RE: Change in definition of RDF literals
> 
> 
> Sorry for being offline so long.
> 
> I have added the I18N WG back into the cc.
> 
> 
> At 12:26 03/05/27 +0300, Patrick.Stickler@nokia.com wrote:
> 
> 
> > > I understood Martin to suggest that this
> > > distinction is
> > > un-needed -- if one accepts this premise, why is round-tripping of
> > > particular syntactic forms of any importance?
> >
> >It comes down to integration of RDF tools with XML tools. Round
> >tripping makes life alot easier.
> >
> >I would argue that if round-tripping of XML literals is not
> >important, then neither are XML literals, and we can simply
> >have plain literals and applications must employ the necessary
> >escaping/quoting mechanisms for all literals when producing
> >RDF/XML.
> 
> There is very clearly a need for distinguishing real XML from
> textual strings that contain characters that look like XML
> markup, such as this is used in examples that discuss XML.
> In this sense, I agree with Patrick and Jeremy.
> 
> But this does NOT imply a necessity for using a special bit
> in a data store. It is easily possible (although not necessary;
> implementations can do what they want internally) to store
> both plain and XML literals in the same escaping form.

The issue here is not about what an application does to store
XML literals internally.

It's about how XML literals are distinguished from plain literals
in the abstract graph syntax. 

In the abstract graph syntax, something like a bit/flag is needed.

> To go back to some examples, in RDF/XML:
> 
> A1) <foo>Hello World!</foo>
> 
> A2) <foo rdf:parseType="Literal">Hello World!</foo>
> 
> B1) <foo>Hello &amp; World!</foo>
> 
> B2) <foo rdf:parseType="Literal">Hello &amp; World!</foo>
> 
> C1) <foo>Hello <em>World!</em></foo>
> 
> C2) <foo rdf:parseType="Literal">Hello <em>World!</em></foo>
> 
> D1) <foo>Hello &lt;em&gt;World!&lt;/em&gt;</foo>
> 
> D2) <foo rdf:parseType="Literal">Hello 
> &lt;em&gt;World!&lt;/em&gt;</foo>
> 
> 
> You will note the following:
> 
> C1) is illegal. A1/A2, B1/B2, and D1/D2 each represent exactly the
> same string, and in XML, this string is represented in the same way.
> In A), it is a string without anything special; in B, it is a string
> containing an ampersand, and in D, it is a string containing '<'s and
> '>'s that lets it look like XML markup.
> 
> To store these strings, apart from the fact that in some cases
> it says rdf:parseType="Literal", which we can discuss separately,
> we have various solutions. From what Jeremy told me, currently the
> most frequent solution seems to be:
> 
> A1) Hello World! (#)
> 
> A2) Hello World! (*)
> 
> B1) Hello & World! (#)
> 
> B2) Hello &amp; World! (*)
> 
> C2) Hello <em>World!</em> (*)
> 
> D1) Hello <em>World!</em> (#)
> 
> D2) Hello &lt;em&gt;World!&lt;/em&gt; (*)
> 
> (#) and (*) standing for the differences in escaping that
> is applied in store, or that has to be applied when writing
> out XML.
>
> But there is another solution:
> 
> A1) Hello World!
> 
> A2) Hello World!
> 
> B1) Hello &amp; World!
> 
> B2) Hello &amp; World!
> 
> C2) Hello <em>World!</em>   (@)
> 
> D1) Hello &lt;em&gt;World!&lt;/em&gt;
> 
> D2) Hello &lt;em&gt;World!&lt;/em&gt;
> 
> In this solution, we don't need to distinguish different escapings,
> because the same escaping is used throughout. For practical speed
> purposes, I have added a (@) mark, which indicates that in this
> case, not using parseType='Literal' when writing out is not an option.
> Of course, C and D are still clearly distinct, and it is not possible
> to get from one to the other via write-out-read-in or entailment.

I don't see how this allows one to reliably distinguish
between literals that are asserted to be XML literals and
literals that just look like XML literals.

Whether you apply escapings or not does not change this.

> The second implementation also makes it very easy to support
> parseType='Literal', and makes it quite natural to not make
> a big distinction between plain literal and XML literals, a
> plain literal just being a special case of an XML literal without
> any element markup. 

I don't believe this is correct. A plain literal is *not* just
an XML literal without any element markup. It is a character
sequence that is in no way constrained by any XML well formedness
rules at all. It might very well contain XML markup, but is known
to not be well formed.

> Of course, we can still make a strict
> distinction between plain literals and XML literals if we really
> think we need to, but this is independent of escaping issues.

I agree with you there.

> Escaping issues only depend on the internals of an application,
> not on the distinction between plain literals and XML literals.

I myself have not been talking at all about any application
internals, but about the abstract graph syntax, and the distinction
between plain and XML literals, which IMO must be captured in
the abstract syntax.

I think it's clear that trying to unify plain and XML literals
raises far too many questions at this stage of the process to
consider it seriously.

If you can live with it, I propose that we stick with the WG's
decision to remove lang tags from all typed literals and to treat
rdfs:XMLLiteral as a non-special, normal datatype (with no 
lang tag).

Patrick
Received on Wednesday, 28 May 2003 04:06:05 UTC