Re: rdf:parseType="Literal" and XML Fragment interchange from Jeremy Carroll on 2001-08-23 (w3c-rdfcore-wg@w3.org from August 2001)

From: Jeremy Carroll <jjc@hplb.hpl.hp.com>
Date: Thu, 23 Aug 2001 20:04:17 +0200
To: <w3c-rdfcore-wg@w3.org>
Cc: <www-xml-fragment-comments@w3.org>
Message-ID: <MABBLGKMPIJFCKFGDBEPAEGHCAAA.jjc@hplb.hpl.hp.com>
I am unconvinced as to whether fragments addresses RDF needs.

While the context stuff is relevant, there is no canonicalisation.

Fragments are represented as XML document(s), which approximate to text
strings.

Hence the following tests are distinguishable.

All should be processed using the same base URL (if this offends globally
replace ":Description" with ":Description
rdf:about='http://example.org/parseTypeEqualsLiteral'" where rdf is the
appropriately bound namespace).

test0001 shows that an empty element expressed as one tag is distibguishable
from one expressed as two;
test0002 shows that attribute order matters;
test0002c shows that whitespace within a tag matters;
test0003 shows that comments are not stripped;
test0004 shows that namespace bindings are relevant, (the attribute meaning
may be changed by the choice of prefix for the RDF namespace!)
test0005 shows that all namespace bindings are significant, even though
neither of these are referred to in the literal.


=== test0001a.rdf
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
   <rdf:Description>
      <rdf:value rdf:parseType="Literal" >
          <foo></foo>
      </rdf:value>
   </rdf:Description>
</rdf:RDF>

=== test0001b.rdf
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
   <rdf:Description>
      <rdf:value rdf:parseType="Literal" >
          <foo/>
      </rdf:value>
   </rdf:Description>
</rdf:RDF>

=== test0002a.rdf
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
   <rdf:Description>
      <rdf:value rdf:parseType="Literal" >
          <foo a="a" b="b"/>
      </rdf:value>
   </rdf:Description>
</rdf:RDF>

=== test0002b.rdf
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
   <rdf:Description>
      <rdf:value rdf:parseType="Literal" >
          <foo b="b" a="a"/>
      </rdf:value>
   </rdf:Description>
</rdf:RDF>

=== test0002c.rdf
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
   <rdf:Description>
      <rdf:value rdf:parseType="Literal" >
          <foo a="a"     b="b"/>
      </rdf:value>
   </rdf:Description>
</rdf:RDF>

=== test0003a.rdf
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
   <rdf:Description>
      <rdf:value rdf:parseType="Literal" >
          <foo></foo>
      </rdf:value>
   </rdf:Description>
</rdf:RDF>

=== test0003b.rdf
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
   <rdf:Description>
      <rdf:value rdf:parseType="Literal" >
          <foo><!-- a comment --></foo>
      </rdf:value>
   </rdf:Description>
</rdf:RDF>

=== test0004a.rdf
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
   <rdf:Description>
      <rdf:value rdf:parseType="Literal" >
          <foo a="x:b"></foo>
      </rdf:value>
   </rdf:Description>
</rdf:RDF>

=== test0004b.rdf
<x:RDF xmlns:x="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
   <x:Description>
      <x:value x:parseType="Literal" >
          <foo a="x:b"></foo>
      </x:value>
   </x:Description>
</x:RDF>

=== test0005a.rdf
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
   <rdf:Description>
      <rdf:value rdf:parseType="Literal" >
          <foo></foo>
      </rdf:value>
   </rdf:Description>
</rdf:RDF>

=== test0005b.rdf
<x:RDF xmlns:x="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
   <x:Description>
      <x:value x:parseType="Literal" >
          <foo></foo>
      </x:value>
   </x:Description>
</x:RDF>

===========
End of tests.

My belief is that to progress the literal representation issue we need to
first consider test cases like these. We can consider them against a number
of proprosals e.g.

STRING:  the literal is represented precisely by the string in the source
document
FRAG:    the literal is represented by a (to be defined) representation
conformant with XML Fragment Interchange specification.
CANON:   the literal is represented by the XML Canonicalisation of the
string in the source document.
INFOSET:    the string is represented by something from which a string can
be derived which when inserted into the source document in the place of the
original string leaves the XML Infoset of the source document unchanged.
NODESET:    the string is represented by something from which a string can
be derived which when inserted into the source document in the place of the
original string leaves the Xpath nodeset of the source document unchanged.
CANONINFO:  the string is represented by a canonical representation of the
infoset of the original string. Note: defining such a representation is
quite hard and not done.

We note that both STRING and FRAG are special cases of INFOSET; and CANON is
a special case of NODESET.

The truth table for the tests above is as follows
                1    2   3   4   5
STRING          f    f   f   t   t
FRAG            f    f   f   f   f
CANON           t    t   f   f   f
INFOSET         -    -   f   t*  t*
NODESET         -    -   f   -   -
CANONINFO       t    t   f   t*  t*

I am unsure about the four starred entries.
The t shows that the test data produces the same model, an f shows that the
test data produces different models, the - means that implementations may
produce either result.

If this is seen as a positive way forward, I can produce some more examples
early september in time for the RDF Core WG teleconference on Sept 7.

An argument against this approach is that the current M&S spec specifically
excludes testing for equality on such XML literals; in my view, this is
because that spec explicitly ducked doing these properly, and one of the
clarifications we are expected to make would allow for equality testing.

As I see it, the heart of the problem is what is the meaning of some XML.
The answer is that it is application dependent, and we should not try and
second guess which parts of infoset the application will look at; but the
application may not look at things outside infoset. However, it is plausible
to take a well-defined subset of Infoset, in particular a subset blessed by
some other W3C WG (such as the XPath nodeset).

Jeremy Carroll
HP Labs Bristol
Received on Thursday, 23 August 2001 13:55:11 UTC