RE: Change in definition of RDF literals from Martin Duerst on 2003-05-27 (w3c-rdfcore-wg@w3.org from May 2003)

From: Martin Duerst <duerst@w3.org>
Date: Tue, 27 May 2003 11:22:17 -0400
To: <Patrick.Stickler@nokia.com>, <gk@ninebynine.org>, <jjc@hplb.hpl.hp.com>
Cc: <w3c-rdfcore-wg@w3.org>, w3c-i18n-ig@w3.org
Message-Id: <4.2.0.58.J.20030527105443.072b2590@localhost>
Sorry for being offline so long.

I have added the I18N WG back into the cc.


At 12:26 03/05/27 +0300, Patrick.Stickler@nokia.com wrote:


> > I understood Martin to suggest that this
> > distinction is
> > un-needed -- if one accepts this premise, why is round-tripping of
> > particular syntactic forms of any importance?
>
>It comes down to integration of RDF tools with XML tools. Round
>tripping makes life alot easier.
>
>I would argue that if round-tripping of XML literals is not
>important, then neither are XML literals, and we can simply
>have plain literals and applications must employ the necessary
>escaping/quoting mechanisms for all literals when producing
>RDF/XML.

There is very clearly a need for distinguishing real XML from
textual strings that contain characters that look like XML
markup, such as this is used in examples that discuss XML.
In this sense, I agree with Patrick and Jeremy.

But this does NOT imply a necessity for using a special bit
in a data store. It is easily possible (although not necessary;
implementations can do what they want internally) to store
both plain and XML literals in the same escaping form.

To go back to some examples, in RDF/XML:

A1) <foo>Hello World!</foo>

A2) <foo rdf:parseType="Literal">Hello World!</foo>

B1) <foo>Hello &amp; World!</foo>

B2) <foo rdf:parseType="Literal">Hello &amp; World!</foo>

C1) <foo>Hello <em>World!</em></foo>

C2) <foo rdf:parseType="Literal">Hello <em>World!</em></foo>

D1) <foo>Hello &lt;em&gt;World!&lt;/em&gt;</foo>

D2) <foo rdf:parseType="Literal">Hello &lt;em&gt;World!&lt;/em&gt;</foo>


You will note the following:

C1) is illegal. A1/A2, B1/B2, and D1/D2 each represent exactly the
same string, and in XML, this string is represented in the same way.
In A), it is a string without anything special; in B, it is a string
containing an ampersand, and in D, it is a string containing '<'s and
'>'s that lets it look like XML markup.

To store these strings, apart from the fact that in some cases
it says rdf:parseType="Literal", which we can discuss separately,
we have various solutions. From what Jeremy told me, currently the
most frequent solution seems to be:

A1) Hello World! (#)

A2) Hello World! (*)

B1) Hello & World! (#)

B2) Hello &amp; World! (*)

C2) Hello <em>World!</em> (*)

D1) Hello <em>World!</em> (#)

D2) Hello &lt;em&gt;World!&lt;/em&gt; (*)

(#) and (*) standing for the differences in escaping that
is applied in store, or that has to be applied when writing
out XML.

But there is another solution:

A1) Hello World!

A2) Hello World!

B1) Hello &amp; World!

B2) Hello &amp; World!

C2) Hello <em>World!</em>   (@)

D1) Hello &lt;em&gt;World!&lt;/em&gt;

D2) Hello &lt;em&gt;World!&lt;/em&gt;

In this solution, we don't need to distinguish different escapings,
because the same escaping is used throughout. For practical speed
purposes, I have added a (@) mark, which indicates that in this
case, not using parseType='Literal' when writing out is not an option.
Of course, C and D are still clearly distinct, and it is not possible
to get from one to the other via write-out-read-in or entailment.

The second implementation also makes it very easy to support
parseType='Literal', and makes it quite natural to not make
a big distinction between plain literal and XML literals, a
plain literal just being a special case of an XML literal without
any element markup. Of course, we can still make a strict
distinction between plain literals and XML literals if we really
think we need to, but this is independent of escaping issues.
Escaping issues only depend on the internals of an application,
not on the distinction between plain literals and XML literals.


Regards,    Martin.
Received on Tuesday, 27 May 2003 11:35:56 UTC