Re: Possible solutions for ISSUE 87 from Ivan Herman on 2008-03-13 (www-rdf-interest@w3.org from March 2008)

From: Ivan Herman <ivan@w3.org>
Date: Thu, 13 Mar 2008 15:44:16 +0100
To: Mark Birbeck <mark.birbeck@x-port.net>
CC: W3C RDFa task force <public-rdf-in-xhtml-tf@w3.org>, www-rdf-interest@w3.org, "Jeremy J. Carroll" <jjc@hpl.hp.com>
Message-ID: <47D93DC0.2010809@w3.org>
Mark,

I am afraid I diverge a bit with your analysis, although I agree that we 
have to be a bit loose in the final specification.

You refer to the RDF Concept document which, indeed, refers to the XML 
canonicalization. However... let us look, for example, at the RDF/XML 
document[1]. Note that the section on XML Literals does *not* say that 
the XML Literal *in the RDF XML serialiation* must be canonicalized. 
Actually, the example in that very section:

<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
          xmlns:ex="http://example.org/stuff/1.0/">
   <rdf:Description rdf:about="http://example.org/item01">
     <ex:prop rdf:parseType="Literal"
              xmlns:a="http://example.org/a#"><a:Box required="true">
          <a:widget size="10" />
          <a:grommit id="23" /></a:Box>
     </ex:prop>
   </rdf:Description>
</rdf:RDF>


is not doing this either, because the namespace definition for 'a' is 
_not_ on the XMLLiteral portion but the 'enclosing' XML element (which 
represents a predicate and is not part of the XML Literal object).

What the document says is that *when the encoding is transformed into an 
RDF Graph*, then the canonicalization must be performed. Look at section 
7.2.17 of the grammar production rules[3] which essentially says that. 
Ie, the RDF/XML parser, seeing the code above, must produce an XML 
Literal with the lexical part being:

<a:Box xmlns:a="http://example.org/a#" required="true">
          <a:widget size="10" />
          <a:grommit id="23" /></a:Box>

(note the namespace declaration that has been moved 'down' to the 
<a:Box> element.)

What this means is that if my RDFa parser produces an RDF/XML output, 
the only thing I have to make sure is that all namespaces *are* defined 
somewhere on the RDF/XML tree, so that an RDF/XML parser would be able 
to to a proper job in the canonicalization. One way would be to say that 
we 'dump', as you say, all the namespace information to all the top 
level nodes of an XML literal. But, in fact, if I don't do anything 
except adding the xhtml namespace on the rdf:RDF level, too, this would 
also be perfectly acceptable. Hence a certain looseness in the way we 
would define this thing... But I do not think the canonicalization 
algorithm is supposed to be applied on that point.

The problem occurs when an RDFa implementation produces directly an RDF 
graph. Then, of course, somebody along the line must perform the 
canonicalization. But that, in some ways, another matter.

Now let us see what wiser people like Jeremy will tell us:-)

Ivan


[1] http://www.w3.org/TR/rdf-syntax-grammar
[2] http://www.w3.org/TR/rdf-syntax-grammar/#section-Syntax-XML-literals
[3] http://www.w3.org/TR/rdf-syntax-grammar/#section-grammar-productions


Mark Birbeck wrote:
> Hello all,
> 
> During our discussions last week, I suggested that there are a number
> of ways that we could tackle the rdf:XMLLiteral question. However, the
> more I've delved into this, the more I've had to conclude that we
> can't solve it, at least in a very straightforward way.
> 
> I've presented the details below, and I'm also copying to the RDF
> interest list, because I believe there is an issue of interpretation
> here, in relation to RDF Concepts [1], that may impact our resolution.
> (In particular, there may be a view that we can be more liberal than I
> am being, in which case we might be able to add more explicit support
> after all.) I'm also CCing Jeremy because he wrote some interesting
> comments on XML literals in the context of reviewing the early RDFa
> drafts, and if anyone can find a way through this, it will be him! (No
> pressure... ;) )
> 
> 
> CONTEXT
> 
> If we run a Last Call conformant RDFa parser over the following:
> 
>   <h2 property="dc:title" datatype="rdf:XMLLiteral">
>     E = mc<sup>2</sup>: The Most Urgent Problem of Our Time
>   </h2>
> 
> we get an XML literal that obviously contains XHTML, but doesn't have
> the XHTML namespace anywhere.
> 
> To be correct according to RDF Concepts, the parsed output would need to be:
> 
>   <> dc:title
>     "E = mc<sup xmlns="http://www.w3.org/1999/xhtml">2</sup>: ...
>     ... The Most Urgent Problem of Our Time"^^rdf:XMLLiteral .
> 
> Note the addition of the default namespace.
> 
> 
> EXCLUSIVE CANONICALISATION
> 
> The RDF Concepts document says that an XML literal needs to be
> "exclusive Canonical XML". The algorithm for this is obtained from the
> Exclusive XML Canonicalization spec [2], and essentially dictates that
> currently in-scope namespaces must be placed on the apex node, and
> that all 'visibly utilised' namespaces must appear on the most
> appropriate start tag, if that namespace has not been defined on an
> ancestor.
> 
> For example, the Exclusive Canonicalization of this:
> 
>   <div>
>     <svg:rect ...>
>       <xf:input ...>...</xf:input>
>       <img ... />
>     </svg:rect>
>   </div>
> 
> would be this
> 
>   <div xmlns="...">
>     <svg:rect xmlns:svg="..." ...>
>       <xf:input xmlns:xf="..." ...>...</xf:input>
>       <img ... />
>     </svg:rect>
>   </div>
> 
> The root <div> is the 'apex node'.
> 
> 
> PROBLEMS FOR IMPLEMENTATIONS
> 
> The problems that we have with this in RDFa parsers fall into two
> categories; those that simply involve implementing the algorithm, and
> those that relate to the data having to be interpreted as an XPath
> data model.
> 
> 
> PROBLEMS: ALGORITHM
> 
>>From the algorithm's point of view, the easy part is that the apex
> node must contain all currently active namespaces; we have these,
> because they are the currently in-scope prefix mappings in our
> processing rules. We could therefore easily 'dump' those onto the apex
> node.
> 
> However, the next part is slightly more tricky, in that any "visibly
> utilised" namespace must be added to the correct start tag, if it's
> not already on an ancestor. Actually, it's stronger than that in that
> the namespace must *not* appear if it has been defined by an ancestor.
> The following would therefore be incorrect:
> 
>   <div xmlns="...">
>     <svg:rect xmlns:svg="..." ...>
>       <xf:input xmlns:xf="..." ...>
>         <xf:label xmlns:xf="..." ...>...</xf:label>
>       </xf:input>
>       <img ... />
>     </svg:rect>
>   </div>
> 
> The reason why this would be 'wrong' (so to speak) is that the XForms
> label element does not need the XForms namespace, since it is already
> present on the XForms input control.
> 
> (As explained at the end, I think this is an unnecessary restriction,
> and has unfortunate consequences.)
> 
> 
> PROBLEMS: XPATH DATA MODEL
> 
> But the bigger problem I foresee, is that the XML literal must be
> processed using the XPath data model, which means sorting out things
> like entities, removing comments, and so on. This seems to imply that
> an RDFa parser would need to support an XML parser, which seems an
> unfortunate requirement.
> 
> 
> ARE THERE ANY EASY SOLUTIONS?
> 
> I'm afraid that I don't believe there are any easy solutions. If we
> explicitly say that we are creating XML literals, then I don't see any
> way that they can't be 'proper' XML literals, as laid down by the RDF
> Concepts document, and that means Exclusive Canonicalisation. In turn,
> that means namespaces have to be sorted out, entities have to be
> encoded/decoded/etc., and so on.
> 
> So...my gut feeling is that RDFa should not 'support' XML literals in
> this release.
> 
> However, we _should_ reserve all of the necessary architecture, such
> as saying that @datatype="rdf:XMLLiteral" is reserved but undefined,
> that @property with no @content but with child elements is undefined,
> and so on.
> 
> Of course, for the sake of producing useful software, implementers
> would be advised to create a 'dumb' XML literal, by simply copying the
> inner content of the child elements. We can say something like "we'll
> look for implementer experience to help guide this part of the spec in
> a future version". But the main point is that I don't think we can say
> we are properly supporting XML literals unless we support Exclusive
> Canonicalisation, and that is quite a burden.
> 
> 
> SIDE NOTES
> 
> My feeling is that this is not a problem of our making, and that XML
> literals are just pretty badly defined. The problme in my view is not
> that they rely on Exclusive Canonicalisation, but that they do so in
> the wrong way.
> 
> Any comparison that takes place between values would have to achieved
> by parsing those values in an XML parser anyway (as RDF Concepts also
> says), and making a comparison at the level of the infoset. Which
> means that these two fragments of XML would cause a match when
> compared in this way:
> 
>   <div xmlns="...">
>     <svg:rect xmlns:svg="..." ...>
>       <xf:input xmlns:xf="..." ...>
>         <xf:label xmlns:xf="..." ...>...</xf:label>
>       </xf:input>
>       <img ... />
>     </svg:rect>
>   </div>
> 
>   <div xmlns="..." xmlns:svg="..." xmlns:xf="...">
>     <svg:rect ...>
>       <xf:input ...>
>         <xf:label ...>...</xf:label>
>       </xf:input>
>       <img ... />
>     </svg:rect>
>   </div>
> 
> However, the first fragment is not strictly 'exclusively
> canonicalised', due to the extra namespace. So the process should be
> to canonicalise, and then compare.
> 
> But what RDF Concepts does is to say (effectively) that we should
> canonicalise the XML, and then store it. And then later on, if we want
> to compare, we already have the canonicalised form. But the big
> problem with this is that we are no longer able to simply store
> structured mark-up that we want to round-trip, without comparing it to
> anything.
> 
> What RDF Concepts should have done, in my opinion, is used the idea of
> an XML literal to simply indicate the datatype, as a kind of flag, and
> then leave the Exclusive Canonicalisation stuff to the act of
> comparison. If data is simply being stored for later retrieval then
> why go to lots of effort to store it in an 'unambiguous' way? In
> particular, why require that all RDF applications must support an XML
> parser?
> 
> 
> But since this is not in our power to control, I think punting it to a
> future version of RDFa makes some sense. And in the short-term,
> implementers can add 'dumb' support to their parsers.
> 
> 
> (I've not really discussed other possible solutions such as inventing
> our own XHTML datatype, since I think they are the wrong way to go,
> and I didn't get the sense that anyone was completely enthusiastic
> about that route, on the call. But there are some angles to it, if
> people really feel we must have a solution now, rather than postponing
> this to a future version of RDFa.)
> 
> Regards,
> 
> Mark
> 
> [1] <http://www.w3.org/TR/rdf-concepts/>
> [2] <http://www.w3.org/TR/xml-exc-c14n/>
> 

-- 

Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
PGP Key: http://www.ivan-herman.net/pgpkey.html
FOAF: http://www.ivan-herman.net/foaf.rdf
Received on Thursday, 13 March 2008 14:44:54 UTC