Re: Possible solutions for ISSUE 97 from Ivan Herman on 2008-03-20 (public-rdf-in-xhtml-tf@w3.org from March 2008)

From: Ivan Herman <ivan@w3.org>
Date: Thu, 20 Mar 2008 12:12:43 +0100
To: Mark Birbeck <mark.birbeck@x-port.net>
CC: Ben Adida <ben@adida.net>, public-rdf-in-xhtml-tf@w3.org
Message-ID: <47E246AB.9040708@w3.org>
Mark Birbeck wrote:
> Hi Ben,
> 
> There is only one *exclusive* canonical form of your XML, and that is this:
> 
>   <div xmlns="http://www.w3.org/1999/xhtml">
>     foo <b>bar</b>
>   </div>
> 
> I assume what you're trying to say is that (a) if we are entitled to
> serialise this 'abstract' value in two different ways, then (b) what
> difference does it make what we use as the abstract version, meaning
> that (c) we don't need to use canonicalisation to create XML literals.
> 
> First, I don't agree that there are many different serialisations
> possible of the same graph. 

Mark, I think you are wrong on that point.

7.2.17 of the RDF/XML syntax Rec:

http://www.w3.org/TR/rdf-syntax-grammar/

[[[
l [the XMLLiteral, note by Ivan] is transformed into the lexical form of 
an XML literal in the RDF graph x (a Unicode string) by the following 
algorithm. This does not mandate any implementation method — any other 
method that gives the same result may be used.

    1. Use l to construct an XPath[XPATH] node-set (a document subset)
    2. Apply Exclusive XML Canonicalization [XML-XC14N]) with comments 
and with empty InclusiveNamespaces PrefixList to this node-set to give a 
sequence of octets s
    3. This sequence of octets s can be considered to be a UTF-8 
encoding of some Unicode string x (sequence of Unicode characters)
    4. The Unicode string x is used as the lexical form of l
    5. This Unicode string x SHOULD be in NFC Normal Form C[NFC]
]]]

Note entry #2. This means that if one has two different serializations 
with XML Literals (like the ones provided by Ben) that, when Exclusive 
XML Canonicalization is applied, lead to the same lexical form, then the 
two serializations express exactly the same RDF graph.

Again: _you are absolutely right_ that an RDFa to abstract RDF processor 
must perform the canonicalization, just as the RDF/XML describes it! But 
what I am saying and, I think, what Ben is saying is that this is 
inconsequential to RDFa implementations that produce, say, RDF/XML or 
Turtle, because the abstract RDFa->Graph machine is then a combination 
of two engines: the RDFa->RDF/XML translator and the RDF/XML parser to 
whatever RDF implementation you choose. And none of the RDF 
serializations syntaxes in use require the user to use canonicalization. 
And that means that the changes on the syntax document are marginal and 
only require explanations, and the change on the SPARQL test is again 
marginal.

Ie: I simply do not believe we really have a major problem here!


Ivan



>                               Given the exclusive canonicialised form I
> just gave, what is it that would add the second namespace declaration?
> I.e., what part of the process of moving from abstract graph to
> concrete serialisation would alter the form of the string of
> characters that represent the abstract data.
> 
> Second, if there was this scope for 'wiggle room'--i.e., that we could
> unilaterally decide whether to canonicalise or not--then what would be
> the point of exclusive canonicalisation in RDF in the first place?
> 
> I think the nature of XML literals and the fact that they are based on
> XML that has been exclusively canonicalised leaves very little room to
> manoeuvre. If we want to do XML literals, we have no choice but to do
> them properly. The alternative is to accept that the data we are
> dealing with is *not* generic XML, but is actually XHTML, and process
> it accordingly. (I'll outline that in a separate post.)
> 
> Regards,
> 
> Mark
> 
> On 20/03/2008, Ben  <ben@adida.net> wrote:
>> Mark Birbeck wrote:
>>  > So however people reply to this view-point, they need to make some
>>  > reference to RDF Concepts, and say why my interpretation *of that* is
>>  > wrong.
>>
>>
>> I think you're framing the problem incorrectly and torturing yourself
>>  into more complexity than needed.
>>
>>  But instead of writing another massive email, let me try to identify
>>  where, in the logical flow, we disagree with one another.
>>
>>  In your flow, here's where I disagree:
>>
>>
>>  >   (1) we run the RDFa parser on an input document,
>>  >   (*) the output of the RDFa parser is RDF
>>  >   (2) we take the output of the parser and stuff it into a triple store,
>>  >   (3) we SPARQL against the triple store.
>>
>>
>> Step (*) is imprecise, in my opinion; it mixes abstract and concrete.
>>  The output of an RDFa parser is, IMO, *a serialization of an RDF graph*.
>>  That is the key difference, because the "RDF Concepts" definition of
>>  XMLLiteral applies to the abstract graph, not to all of the graph's
>>  valid serializations.
>>
>>  Now, help me understand where you disagree with my reasoning. Here are
>>  two RDF N3 *serializations*:
>>
>>  <> dc:title
>>  "<div xmlns="http://www.w3.org/1999/xhtml">
>>    foo <b xmlns="http://www.w3.org/1999/xhtml">bar</b>
>>  </div>"^^rdf:XMLLiteral
>>
>>  and
>>
>>  <> dc:title
>>  "<div xmlns="http://www.w3.org/1999/xhtml">
>>    foo <b>bar</b>
>>  </div>"^^rdf:XMLLiteral
>>
>>
>>  If I'm reading the XMLLiteral canonicalization process correctly, I
>>  believe that the two examples above are serializations of the same RDF
>>  graph.
>>
>>  Do you agree? If not, why not?
>>
>>
>>  -Ben
>>
> 
> 

-- 

Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
PGP Key: http://www.ivan-herman.net/pgpkey.html
FOAF: http://www.ivan-herman.net/foaf.rdf
Received on Thursday, 20 March 2008 11:13:13 UTC