Fwd: Possible solutions for ISSUE 87 from Mark Birbeck on 2008-03-13 (semantic-web@w3.org from March 2008)

From: Mark Birbeck <mark.birbeck@x-port.net>
Date: Thu, 13 Mar 2008 14:10:10 +0000
To: "Semantic Web" <semantic-web@w3.org>
Message-ID: <a707f8300803130710k2bdbcd5se559364ea1192e9d@mail.gmail.com>
This should have come to this list.

Apologies...

---------- Forwarded message ----------
From: Mark Birbeck <mark.birbeck@x-port.net>
Date: 13 Mar 2008 14:00
Subject: Possible solutions for ISSUE 87
To: W3C RDFa task force <public-rdf-in-xhtml-tf@w3.org>
Cc: www-rdf-interest@w3.org, "Jeremy J. Carroll" <jjc@hpl.hp.com>


Hello all,

 During our discussions last week, I suggested that there are a number
 of ways that we could tackle the rdf:XMLLiteral question. However, the
 more I've delved into this, the more I've had to conclude that we
 can't solve it, at least in a very straightforward way.

 I've presented the details below, and I'm also copying to the RDF
 interest list, because I believe there is an issue of interpretation
 here, in relation to RDF Concepts [1], that may impact our resolution.
 (In particular, there may be a view that we can be more liberal than I
 am being, in which case we might be able to add more explicit support
 after all.) I'm also CCing Jeremy because he wrote some interesting
 comments on XML literals in the context of reviewing the early RDFa
 drafts, and if anyone can find a way through this, it will be him! (No
 pressure... ;) )


 CONTEXT

 If we run a Last Call conformant RDFa parser over the following:

  <h2 property="dc:title" datatype="rdf:XMLLiteral">
    E = mc<sup>2</sup>: The Most Urgent Problem of Our Time
  </h2>

 we get an XML literal that obviously contains XHTML, but doesn't have
 the XHTML namespace anywhere.

 To be correct according to RDF Concepts, the parsed output would need to be:

  <> dc:title
    "E = mc<sup xmlns="http://www.w3.org/1999/xhtml">2</sup>: ...
    ... The Most Urgent Problem of Our Time"^^rdf:XMLLiteral .

 Note the addition of the default namespace.


 EXCLUSIVE CANONICALISATION

 The RDF Concepts document says that an XML literal needs to be
 "exclusive Canonical XML". The algorithm for this is obtained from the
 Exclusive XML Canonicalization spec [2], and essentially dictates that
 currently in-scope namespaces must be placed on the apex node, and
 that all 'visibly utilised' namespaces must appear on the most
 appropriate start tag, if that namespace has not been defined on an
 ancestor.

 For example, the Exclusive Canonicalization of this:

  <div>
    <svg:rect ...>
      <xf:input ...>...</xf:input>
      <img ... />
    </svg:rect>
  </div>

 would be this

  <div xmlns="...">
    <svg:rect xmlns:svg="..." ...>
      <xf:input xmlns:xf="..." ...>...</xf:input>
      <img ... />
    </svg:rect>
  </div>

 The root <div> is the 'apex node'.


 PROBLEMS FOR IMPLEMENTATIONS

 The problems that we have with this in RDFa parsers fall into two
 categories; those that simply involve implementing the algorithm, and
 those that relate to the data having to be interpreted as an XPath
 data model.


 PROBLEMS: ALGORITHM

 From the algorithm's point of view, the easy part is that the apex
 node must contain all currently active namespaces; we have these,
 because they are the currently in-scope prefix mappings in our
 processing rules. We could therefore easily 'dump' those onto the apex
 node.

 However, the next part is slightly more tricky, in that any "visibly
 utilised" namespace must be added to the correct start tag, if it's
 not already on an ancestor. Actually, it's stronger than that in that
 the namespace must *not* appear if it has been defined by an ancestor.
 The following would therefore be incorrect:

  <div xmlns="...">
    <svg:rect xmlns:svg="..." ...>
      <xf:input xmlns:xf="..." ...>
        <xf:label xmlns:xf="..." ...>...</xf:label>
      </xf:input>
      <img ... />
    </svg:rect>
  </div>

 The reason why this would be 'wrong' (so to speak) is that the XForms
 label element does not need the XForms namespace, since it is already
 present on the XForms input control.

 (As explained at the end, I think this is an unnecessary restriction,
 and has unfortunate consequences.)


 PROBLEMS: XPATH DATA MODEL

 But the bigger problem I foresee, is that the XML literal must be
 processed using the XPath data model, which means sorting out things
 like entities, removing comments, and so on. This seems to imply that
 an RDFa parser would need to support an XML parser, which seems an
 unfortunate requirement.


 ARE THERE ANY EASY SOLUTIONS?

 I'm afraid that I don't believe there are any easy solutions. If we
 explicitly say that we are creating XML literals, then I don't see any
 way that they can't be 'proper' XML literals, as laid down by the RDF
 Concepts document, and that means Exclusive Canonicalisation. In turn,
 that means namespaces have to be sorted out, entities have to be
 encoded/decoded/etc., and so on.

 So...my gut feeling is that RDFa should not 'support' XML literals in
 this release.

 However, we _should_ reserve all of the necessary architecture, such
 as saying that @datatype="rdf:XMLLiteral" is reserved but undefined,
 that @property with no @content but with child elements is undefined,
 and so on.

 Of course, for the sake of producing useful software, implementers
 would be advised to create a 'dumb' XML literal, by simply copying the
 inner content of the child elements. We can say something like "we'll
 look for implementer experience to help guide this part of the spec in
 a future version". But the main point is that I don't think we can say
 we are properly supporting XML literals unless we support Exclusive
 Canonicalisation, and that is quite a burden.


 SIDE NOTES

 My feeling is that this is not a problem of our making, and that XML
 literals are just pretty badly defined. The problme in my view is not
 that they rely on Exclusive Canonicalisation, but that they do so in
 the wrong way.

 Any comparison that takes place between values would have to achieved
 by parsing those values in an XML parser anyway (as RDF Concepts also
 says), and making a comparison at the level of the infoset. Which
 means that these two fragments of XML would cause a match when
 compared in this way:

  <div xmlns="...">
    <svg:rect xmlns:svg="..." ...>
      <xf:input xmlns:xf="..." ...>
        <xf:label xmlns:xf="..." ...>...</xf:label>
      </xf:input>
      <img ... />
    </svg:rect>
  </div>

  <div xmlns="..." xmlns:svg="..." xmlns:xf="...">
    <svg:rect ...>
      <xf:input ...>
        <xf:label ...>...</xf:label>
      </xf:input>
      <img ... />
    </svg:rect>
  </div>

 However, the first fragment is not strictly 'exclusively
 canonicalised', due to the extra namespace. So the process should be
 to canonicalise, and then compare.

 But what RDF Concepts does is to say (effectively) that we should
 canonicalise the XML, and then store it. And then later on, if we want
 to compare, we already have the canonicalised form. But the big
 problem with this is that we are no longer able to simply store
 structured mark-up that we want to round-trip, without comparing it to
 anything.

 What RDF Concepts should have done, in my opinion, is used the idea of
 an XML literal to simply indicate the datatype, as a kind of flag, and
 then leave the Exclusive Canonicalisation stuff to the act of
 comparison. If data is simply being stored for later retrieval then
 why go to lots of effort to store it in an 'unambiguous' way? In
 particular, why require that all RDF applications must support an XML
 parser?


 But since this is not in our power to control, I think punting it to a
 future version of RDFa makes some sense. And in the short-term,
 implementers can add 'dumb' support to their parsers.


 (I've not really discussed other possible solutions such as inventing
 our own XHTML datatype, since I think they are the wrong way to go,
 and I didn't get the sense that anyone was completely enthusiastic
 about that route, on the call. But there are some angles to it, if
 people really feel we must have a solution now, rather than postponing
 this to a future version of RDFa.)

 Regards,

 Mark

 [1] <http://www.w3.org/TR/rdf-concepts/>
 [2] <http://www.w3.org/TR/xml-exc-c14n/>

 --
  Mark Birbeck

  mark.birbeck@x-port.net | +44 (0) 20 7689 9232
  http://www.x-port.net | http://internet-apps.blogspot.com

  x-port.net Ltd. is registered in England and Wales, number 03730711
  The registered office is at:

    2nd Floor
    Titchfield House
    69-85 Tabernacle Street
    London
    EC2A 4RR


-- 
  Mark Birbeck

  mark.birbeck@x-port.net | +44 (0) 20 7689 9232
  http://www.x-port.net | http://internet-apps.blogspot.com

  x-port.net Ltd. is registered in England and Wales, number 03730711
  The registered office is at:

    2nd Floor
    Titchfield House
    69-85 Tabernacle Street
    London
    EC2A 4RR
Received on Thursday, 13 March 2008 14:10:47 UTC