RE: Change in definition of RDF literals from Graham Klyne on 2003-05-23 (w3c-rdfcore-wg@w3.org from May 2003)

From: Graham Klyne <gk@ninebynine.org>
Date: Fri, 23 May 2003 11:19:50 +0100
To: <Patrick.Stickler@nokia.com>, <duerst@w3.org>, <jjc@hplb.hpl.hp.com>
Cc: <w3c-rdfcore-wg@w3.org>
Message-Id: <5.1.0.14.2.20030523104957.02ca5fb8@127.0.0.1>

At 10:47 23/05/03 +0300, Patrick.Stickler@nokia.com wrote:
>I.e.
>
>    <foo>&lt;span xml:lang='en'&gt;blargh&lt;/span&gt;</foo>
>
>should then equal and/or entail both
>
>    <foo>&lt;span xml:lang="en"&gt;blargh&lt;/span&gt;</foo>
>
>and
>
>    <foo rdf:parseType="Literal">
>       <span xml:lang="en">blargh</span>
>    </foo>
>
>Note the difference in single and double quotes around 'en',
>which are subject to canonicalization.

[Martin, I'd be interested to hear if this is close to what you were 
suggesting.  In what follows it's effectively flushed out with three test 
cases.]

Well, I didn't see it quite that way, but since there's no right or wrong 
here all I can do is discuss possibilities.  In order to analyze this, I 
think we need to consider the examples in two steps:

(1) translation from XML to abstract syntax (graph) -- what an RDF/XML 
parser does

(2) entailment between graphs.

So, taking your three cases above, and making them complete RDF statements, 
I would anticipate:

Test case 1:
------------

   <Subj>
    <foo>&lt;span xml:lang='en'&gt;blargh&lt;/span&gt;</foo>
   </Subj>

-->

   <Subj> <foo> "&lt;span xml:lang='en'&gt;blargh&lt;/span&gt;"

or is it this?:

   <Subj> <foo> "<span xml:lang='en'>blargh</span>" .

(I couldn't find a test case for this, but the amp-in-url/test001 [1][2] 
suggests to me the latter is correct.  I'll assume the latter case for the 
remaining examples.  I think the XML mapping to infoset replaces the 
character entities.)

[1] http://www.w3.org/2000/10/rdf-tests/rdfcore/amp-in-url/test001.rdf
[2] http://www.w3.org/2000/10/rdf-tests/rdfcore/amp-in-url/test001.nt

Test case 2:
------------

   <Subj>
    <foo>&lt;span xml:lang="en"&gt;blargh&lt;/span&gt;</foo>
   </Subj>

-->

   <Subj> <foo> "<span xml:lang=\"en\">blargh</span>" .

(using \ to escape quote-in-string -- sorry can't remember correct form)

I see this as containing a different character sequence, and does not 
entail or is not entailed by the first case.

Test case 3:
------------

   <Subj>
    <foo rdf:parseType="Literal">
       <span xml:lang="en">blargh</span>
    </foo>
   </Subj>

-->

   <Subj> <foo> "<span xml:lang='en'>blargh</span>" .

Because this is parseType="Literal", the literal content is not 
canonicalized by the parser, so we end up with a statement that is entailed 
by and entails that in test case 2, but not test case 1.

---------------

So my take on Martin's suggestion is that all (plain) literals are 
character sequences, some of which might just happen to be valid XML 
fragments, and are compared accordingly.  When parseType="Literal" is used, 
the C14N is applied by the parser, so that equivalent XML thus tagged 
resolves to the same literal value.  Entailment, then, is based on plain 
literals simply denoting themselves, without any regard for whether or not 
they were obtained by C14N of XML.

This, to me, seems like a useful simplification of what we currently have, 
and I'm not aware of any practical application scenario where the small 
differences in entailments thus achieved are likely to be damaging.

#g

-------------------
Graham Klyne
<GK@NineByNine.org>
PGP: 0FAA 69FF C083 000B A2E9  A131 01B9 1C7A DBCA CB5E

Received on Friday, 23 May 2003 07:06:08 UTC