Re: Possible solutions for ISSUE 97 from Ivan Herman on 2008-03-20 (public-rdf-in-xhtml-tf@w3.org from March 2008)

From: Ivan Herman <ivan@w3.org>
Date: Thu, 20 Mar 2008 10:03:54 +0100
To: Mark Birbeck <mark.birbeck@x-port.net>
CC: Ben Adida <ben@adida.net>, public-rdf-in-xhtml-tf@w3.org
Message-ID: <47E2287A.9000206@w3.org>
Ben, Mark,

I do not think the difference of opinion is as big it looks...

My reading of the RDF concepts and its consequences on RDFa is:

- If an implementation produces an RDF Graph (ie, a triple store) of its 
own in, say, memory then, yes, indeed, the canonicalization has to be 
done (in theory). It is _exactly_ the same as for *any* RDF 
serialization parser that produces an RDF Graph, like Jena, RDFLib, 
Redland, you name it. This does not really needs any extra statement in 
the RDFa document, because, well, that is what RDF dictates of what an 
RDF XMLLiteral is.

(Note that checking those implementations with our test harness becomes 
problematic because the SPARQL endpoints need a _serialized_ form of the 
Graph to process. Ie, those implementation can be tested only if they 
also provide a SPARQL processing on top of that triple store!)

- If an implementation of RDFa is, effectively, a translator from one 
serialization (ie, RDFa) to another (ie, Turtle or RDF/XML) then this is 
not an issue as long as the XMLLiteral is well defined in terms of, an 
the XML Infoset (ie, if possible canonicalizations *performed by a third 
party on that serialized Literal* are identical).

Yes, as Ben said, that means putting the necessary xmlns attributes and 
xml:lang attributes wherever that is necessary. This mean adjusting our 
examples and, possibly, adding an extra warning note into the text 
somewhere to make this clear to other implementers

- Our SPARQL tests are also written down in a serialization format, 
which is the SPARQL language, and that language does _not_ require to 
write down the XML Literals in canonical format either. It is the job of 
the SPARQL processor to do that and make matching on the results (both 
on the patter given in the query and on the dataset given to the 
processor via a URI, which, in general, refers to a serialized dataset).

Ie, *as far as the RDFa syntax document goes*, I think that the 
modifications to be done are purely editorial

- the SPARQL tests should be updated to add the xmlns namespaces 
somewhere on the top xml elements (ie, the <sup> in our Einstein examples)

- an informative note may have to be added to the syntax text warning 
implementers that they have to add the necessary namespaces

- maybe an extra test should be added to the suite that checks whether 
the xml:lang attribute has been properly added to an XMLLiteral output, too.

I also found the extra quote in Michael's mail[1] very useful; I 
actually missed that one...

Ivan

P.S. Caveat: I found a bug in sparqler, for example, which shows that 
implementation do not necessary do all this properly. By comparing two 
XMLLiterals, where the attributes in the data are denoted by " (ie,
<bla attr="aaa"/>) and the SPARQL query said <bla attr='aaa'/>, there 
was no match; the sparqlr compiler does not properly implement 
canonicalization that require all attributes do be denoted by double 
quotes rather than single quotes. Andy acknowledged that is a problem in 
his implementation of canonicalization. This just shows that our test 
harness may fail on those tests and we may want to check the test 
results manually for those few tests).



Mark Birbeck wrote:
> Hi Ben,
> 
> I disagree with you and Ivan, I'm afraid.
> 
> Of course, I'm happy to be wrong, since it would make life easier. But
> the bit that needs to be worked through is in the RDF Concepts
> document. The definition of an XML literal lies there, and in my view
> it's quite precise...misguided, but precise.
> 
> So however people reply to this view-point, they need to make some
> reference to RDF Concepts, and say why my interpretation *of that* is
> wrong.
> 
> RDF is independent of syntax, so the concept of an XML literal exists
> *prior* to putting any triples into the triple store; in other words,
> your scheme is wrong, and it needs to be like this:
> 
>   (1) we run the RDFa parser on an input document,
>   (*) the output of the RDFa parser is RDF
>   (2) we take the output of the parser and stuff it into a triple store,
>   (3) we SPARQL against the triple store.
> 
> If we decide to create XML literals, then (a) that must happen
> independent of any triple stores, query interfaces, serialisations,
> etc., and (b) those XML literals must conform to the definition
> provided in RDF Concepts.
> 
> Regards,
> 
> Mark
> 
> On 19/03/2008, Ben Adida <ben@adida.net> wrote:
>>
>>  [I changed the subject from 87 to 97].
>>
>>  Ivan wrote:
>>  > - However, an implementation of RDFa that produces an RDF graph in some
>>  > other serialization (which is the case for a number of our
>>  > implementations, though probably not all; it certainly true for Fabien's
>>  > xslt script, my stuff, probably Manu's code) has to produce a *valid*
>>  > serialized version of the RDF graph.
>>
>>  After some thought, I've become very wary of implementing a new data
>>  type (or even trying to find an existing one in a different space), and
>>  I agree with Ivan.
>>
>>  Yes, an RDFa parser produces an RDF graph. But an RDF graph is an
>>  abstract notion, so the only thing an RDFa parser *can* produce is
>>  *some* serialization of an RDF graph. As long as that serialization is
>>  *a* valid serialization of the correct RDF graph, the RDFa parser is
>>  compliant, in my opinion.
>>
>>  To be more specific, let's examine how we test an RDFa parser:
>>    (1) we run the RDFa parser on an input document,
>>    (2) we take the output of the parser and stuff it into a triple store,
>>    (3) we SPARQL against the triple store.
>>
>>  Steps (2) and (3) are part of the test harness, they're not part of the
>>  RDFa processor.
>>
>>  So, where does the XMLLiteral canonicalization happen? In my opinion,
>>  somewhere between (2) and (3), meaning somewhere in the triple store,
>>  *after* the RDFa parser has done its thing. After all, we don't expect
>>  the RDFa parser to provide a SPARQL interface, so why should it need to
>>  do XMLLiteral canonicalization if it's never performing graph operations?
>>
>>  So, regarding Test Case 11:
>>
>>  http://www.w3.org/2006/07/SWD/RDFa/testsuite/xhtml1-testcases/0011.xhtml
>>
>>  I believe we should include the xmlns declaration, and the SPARQL should
>>  read:
>>
>>  ASK WHERE {
>>         <http://www.w3.org/2006/07/SWD/RDFa/testsuite/xhtml1-testcases/0011.xhtml>
>>  <http://purl.org/dc/elements/1.1/creator> "Albert Einstein" .
>>         <http://www.w3.org/2006/07/SWD/RDFa/testsuite/xhtml1-testcases/0011.xhtml>
>>  <http://purl.org/dc/elements/1.1/title>
>>  "E = mc<sup xmlns=\"http://www.w3.org/1999/xhtml\">2</sup>: The Most
>>  Urgent Problem of Our
>>  Time"^^<http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral> .
>>
>>  }
>>
>>
>>  -Ben
>>
>>
>>  PS: when we implement RDFa in HTML5, we may have to deal with a
>>  different data type. This makes sense: we're extracting markup from the
>>  host language, so the type matches the host language. In the current
>>  case, it's XML.
>>
>>
> 
> 

-- 

Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
PGP Key: http://www.ivan-herman.net/pgpkey.html
FOAF: http://www.ivan-herman.net/foaf.rdf
Received on Thursday, 20 March 2008 09:04:27 UTC