Re: Returning un-escaped XML literals in SPARQL XML Results

Hi Stu,

My apologies for the long delay in responding to your comment.

The Working Group discussed your comment on our mailing list and at last 
week's teleconference[1]. While there is some measure of support for the 
goals of your suggestion, the combination of schedule concerns with a 
lack of a mature existing technical design led to the group choosing not 
to add the possibility for unescaped XML literals in the SPARQL XML 
result format.

To note the issue and to help inform a potential future working group, I 
opened and immediately postponed an issue[2] regarding unescaped XML in 
the SPARQL XML result format.

Please let us know if you are satisfied with this response to your comment.

Lee

[1] 
http://lists.w3.org/Archives/Public/public-rdf-dawg/2007JulSep/att-0175/25-dawg-minutes.html#item02
[2] http://www.w3.org/2001/sw/DataAccess/issues#unescapedXml

Stu Baurmann wrote:
> 
> Howdy!
> 
> (I have tried sending this before and it didn't seem to go through.  
> Apologies if you get multiple copies).
> 
> In September 2006 I posted a question about literal XML inclusion in 
> SPARQL results to the jena-dev list,
> and Andy suggested that I post the issue here.  Sorry it's taken me so 
> long to do that.  I don't see
> anywhere on this list that the subject has come up in the meantime, so 
> perhaps it's still germane.
> 
> When literal XML is stored inside an RDF model, it is in some cases 
> desirable to fetch that content
> as part of a SPARQL XML result stream *without escaping*.   For example, 
> consider the storage of
> XHTML content within RDF literals.  It seems reasonable (and works fine) 
> to assert a triple like this:
> 
> s = m:someDocument
> p = m:hasContent
> o = <xh:p xh="http://www.w3.org/1999/xhtml">Contents of 
> <xh:em>THE</xh:em> paragraph</xh:p>^^rdf:XMLLiteral
> 
> Note that the datatype of the object node is rdf:XMLLiteral.
> Also note that I am only using XHTML as an example, and the literal 
> block could be of any XML type.
> 
> What I would like is to query a model containing this triple, and 
> receive the results as SPARQL-XML,
> with the literal's contents simply included into the result stream as 
> XML, so we would see output like this:
> 
> <binding name="o">
>         <literal 
> datatype="http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral">
>                  <xh:p  xmlns:xh="http://www.w3 
> .org/1999/xhtml">Contents of <xh:em>THE</xh:em> paragraph</xh:p>
>         </literal>
> </binding>
> 
> (I have hacked up my copy of ARQ to do this, and it works great, for my 
> purposes!).
> 
> This is not the normal behavior of ARQ, however.  ARQ will always
> escape the XML tag characters, turning angle brackets into entity 
> references, and so on.
> 
> I can see how some users might want that escaping, but it also seems 
> reasonable to NOT want it.
> Turning the escaped tags back into parsed XML requires a consumer (who 
> shares my assumptions
> and preferences) to serialize the result set document into a buffer and 
> re-parse it from text, which
> is not fun or fast.  From my use case (embedding RDF  technology within 
> an established content
> management application based on the Cocoon XML-pipeline framework), it 
> is very nice to have the
> result set available as a single unbroken XML tree, which is immediately 
> ready for downstream
> processing using XSLT.  With this feature available, the embedding of 
> small fragments of XML
> content within RDF models becomes quite attractive in some situations.
> 
> To me it would be reasonable to control this behaviour ("to escape or 
> not to escape") at the SPARQL
> query engine API level, probably by setting a flag on the ResultSet 
> object.    I proposed this on the
> jena-dev list (with the simple implementation that I had hacked up for 
> my own use), and Andy gave a
> very comprehensive and thoughtful response indicating how this 
> serialization issue relates to
> the design of the XML Schema for SPARQL results, the defined lexical 
> form of the results
> in the spec, and concerns about reparsing the literals in downstream 
> processes:
> 
> http://tech.groups.yahoo.com/group/jena-dev/message/25395
> 
> I understand the desire to keep schemas tight and not have gratuitous 
> XSD:ANY's flying around.
> But, on the other hand, it seems to me that RDF+SPARQL users who choose 
> to use the XMLLiteral
> datatype are essentially choosing to store arbitrary XML within their 
> RDF, and they are tagging
> it as such.  So, if we want to support that use case, allowing the 
> <literal> return block
> to contain XML, and using the XSD:ANY schema type to implement the 
> facility seems appropriate.
> I do see that there are some choices which would need to be made in the 
> face of the limited
> expressiveness of the XML-schema standard, and I won't launch into a 
> discussion of those
> details unless/until others are interested.
> 
> I can also see that some apparent collision issues could arise if 
> literal content uses the
> default namespace, but these don't appear insurmountable.  Again, I won't
> launch into examples until others have had a chance to respond in 
> general terms.
> (Perhaps this whole issue was already debated somewhere before).
> 
> In summary:  I think that if there was a way for SPARQL engines to 
> (optionally) return
> XMLLiterals without escaping the tags (and preferably, without violating 
> applicable
> standards), that would be peachy.
> 
> sincerely,
> 
> Stu Baurmann
> 
> 
> 
> 

Received on Monday, 1 October 2007 00:46:26 UTC