W3C home > Mailing lists > Public > public-rdf-dawg-comments@w3.org > August 2007

Re: Returning un-escaped XML literals in SPARQL XML Results

From: Richard Newman <r.newman@reading.ac.uk>
Date: Thu, 9 Aug 2007 21:12:28 -0700
Message-Id: <DAD4EED6-C92D-4CDA-9514-65B451847E20@reading.ac.uk>
Cc: public-rdf-dawg-comments@w3.org
To: Stu Baurmann <stub@logicu.com>

Stu,

   Speaking purely as an implementer: quite besides any schema issues  
(and Andy covered things well, I feel), this poses some quandaries  
for implementations.

   * If you end up with invalid XML marked as XMLLiteral in the store  
(and it does happen), your SPARQL XML output is suddenly invalid. At  
present, every SPARQL XML results document is valid XML, regardless  
of the content of the store.

   * Andy mentions trouble with namespaces. If you were going to be  
thorough about it, the _server_ has to parse each XML literal in  
order to correctly serialize it: it needs to get namespaces correct  
and (to continue my previous point) ensure that it's valid XML, so  
that the output document is valid XML. You're right that it's not  
"insurmountable", but you are shifting the burden of parsing the XML  
from the client to the server, which is rarely a good tradeoff. Both  
ARQ and twinql use basic string operations to build output, because  
SPARQL XML is easy, so you just made our jobs more difficult. You  
will get your results more slowly because of the heavyweight XML  
operations. Results streaming is no longer trivial. This is the  
tradeoff.

   * Existing clients are not expecting there to be a choice in this  
matter. You'd have to introduce an implementation-specific mechanism  
for turning XML non-escaping on for a particular request -- the  
default must be 'off' for backwards compatibility -- and decide on a  
new MIME type to return when it's on, so that clients know how to  
parse it. (In essence, you're defining a new serialization format  
that has a lot in common with SPARQL XML. If you want to do that, go  
ahead; it doesn't affect SPARQL XML.)

   * How is a client application to know that a <literal> contains  
XML rather than string content? Surely it would be more reasonable to  
introduce <xmlLiteral>?

   * Finally: I know that you have a use for this, but I'm not yet  
convinced that there is anyone else who does. In my experience,  
storage of XML in RDF is uncommon, and wanting to get XML out of  
SPARQL results _and process it immediately as XML_ is even more so.  
Usually people who do this kind of thing are storing XHTML, selecting  
it from the SPARQL results using XSLT, and writing it straight out  
into a page without processing. They can do that today.

   That's my 2.

-R

On  2 Aug 2007, at 1:05 PM, Stu Baurmann wrote:

>
> Howdy!
>
> (I have tried sending this before and it didn't seem to go  
> through.  Apologies if you get multiple copies).
>
> In September 2006 I posted a question about literal XML inclusion  
> in SPARQL results to the jena-dev list,
> and Andy suggested that I post the issue here.  Sorry it's taken me  
> so long to do that.  I don't see
> anywhere on this list that the subject has come up in the meantime,  
> so perhaps it's still germane.
>
> When literal XML is stored inside an RDF model, it is in some cases  
> desirable to fetch that content
> as part of a SPARQL XML result stream *without escaping*.   For  
> example, consider the storage of
> XHTML content within RDF literals.  It seems reasonable (and works  
> fine) to assert a triple like this:
Received on Friday, 10 August 2007 04:12:43 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:14:51 GMT