Re: Returning un-escaped XML literals in SPARQL XML Results from Stu Baurmann on 2007-10-05 (public-rdf-dawg-comments@w3.org from October 2007)

From: Stu Baurmann <stub@logicu.com>
Date: Thu, 04 Oct 2007 21:47:15 -0500
To: Lee Feigenbaum <lee@thefigtrees.net>
CC: public-rdf-dawg-comments@w3.org
Message-ID: <4705A5B3.2090200@logicu.com>
Hi Lee,

I read through the group's discussion.   I'm glad that some members agree
that this feature would be useful.  I understand that there are some design
challenges which will be difficult to resolve cleanly and quickly.   I hope
that this feature can be taken up as a scheduled design task in some future
round of the group's activity.    Until then, perhaps some implementors
will take the opportunity to offer raw XML output as a non-standard extension,
with appropriate caveats applied (e.g. "this extension only produces
well-formed output if conditions A, B, C hold on the stored XML fragment").

So, yes, I am satisfied, and thanks for considering my suggestion!

peace,

Stu

>> Hi Stu,
>>
>> My apologies for the long delay in responding to your comment.
>>
>> The Working Group discussed your comment on our mailing list and at 
>> last week's teleconference[1]. While there is some measure of support 
>> for the goals of your suggestion, the combination of schedule concerns 
>> with a lack of a mature existing technical design led to the group 
>> choosing not to add the possibility for unescaped XML literals in the 
>> SPARQL XML result format.
>>
>> To note the issue and to help inform a potential future working group, 
>> I opened and immediately postponed an issue[2] regarding unescaped XML 
>> in the SPARQL XML result format.
>>
>> Please let us know if you are satisfied with this response to your 
>> comment.
>>
>> Lee
>>
>> [1] 
>> http://lists.w3.org/Archives/Public/public-rdf-dawg/2007JulSep/att-0175/25-dawg-minutes.html#item02 
>>
>> [2] http://www.w3.org/2001/sw/DataAccess/issues#unescapedXml
>>
>> Stu Baurmann wrote:
>>>
>>> Howdy!
>>>
>>> (I have tried sending this before and it didn't seem to go through.  
>>> Apologies if you get multiple copies).
>>>
>>> In September 2006 I posted a question about literal XML inclusion in 
>>> SPARQL results to the jena-dev list,
>>> and Andy suggested that I post the issue here.  Sorry it's taken me 
>>> so long to do that.  I don't see
>>> anywhere on this list that the subject has come up in the meantime, 
>>> so perhaps it's still germane.
>>>
>>> When literal XML is stored inside an RDF model, it is in some cases 
>>> desirable to fetch that content
>>> as part of a SPARQL XML result stream *without escaping*.   For 
>>> example, consider the storage of
>>> XHTML content within RDF literals.  It seems reasonable (and works 
>>> fine) to assert a triple like this:
>>>
>>> s = m:someDocument
>>> p = m:hasContent
>>> o = <xh:p xh="http://www.w3.org/1999/xhtml">Contents of 
>>> <xh:em>THE</xh:em> paragraph</xh:p>^^rdf:XMLLiteral
>>>
>>> Note that the datatype of the object node is rdf:XMLLiteral.
>>> Also note that I am only using XHTML as an example, and the literal 
>>> block could be of any XML type.
>>>
>>> What I would like is to query a model containing this triple, and 
>>> receive the results as SPARQL-XML,
>>> with the literal's contents simply included into the result stream as 
>>> XML, so we would see output like this:
>>>
>>> <binding name="o">
>>>         <literal 
>>> datatype="http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral">
>>>                  <xh:p  xmlns:xh="http://www.w3 
>>> .org/1999/xhtml">Contents of <xh:em>THE</xh:em> paragraph</xh:p>
>>>         </literal>
>>> </binding>
>>>
>>> (I have hacked up my copy of ARQ to do this, and it works great, for 
>>> my purposes!).
>>>
>>> This is not the normal behavior of ARQ, however.  ARQ will always
>>> escape the XML tag characters, turning angle brackets into entity 
>>> references, and so on.
>>>
>>> I can see how some users might want that escaping, but it also seems 
>>> reasonable to NOT want it.
>>> Turning the escaped tags back into parsed XML requires a consumer 
>>> (who shares my assumptions
>>> and preferences) to serialize the result set document into a buffer 
>>> and re-parse it from text, which
>>> is not fun or fast.  From my use case (embedding RDF  technology 
>>> within an established content
>>> management application based on the Cocoon XML-pipeline framework), 
>>> it is very nice to have the
>>> result set available as a single unbroken XML tree, which is 
>>> immediately ready for downstream
>>> processing using XSLT.  With this feature available, the embedding of 
>>> small fragments of XML
>>> content within RDF models becomes quite attractive in some situations.
>>>
>>> To me it would be reasonable to control this behaviour ("to escape or 
>>> not to escape") at the SPARQL
>>> query engine API level, probably by setting a flag on the ResultSet 
>>> object.    I proposed this on the
>>> jena-dev list (with the simple implementation that I had hacked up 
>>> for my own use), and Andy gave a
>>> very comprehensive and thoughtful response indicating how this 
>>> serialization issue relates to
>>> the design of the XML Schema for SPARQL results, the defined lexical 
>>> form of the results
>>> in the spec, and concerns about reparsing the literals in downstream 
>>> processes:
>>>
>>> http://tech.groups.yahoo.com/group/jena-dev/message/25395
>>>
>>> I understand the desire to keep schemas tight and not have gratuitous 
>>> XSD:ANY's flying around.
>>> But, on the other hand, it seems to me that RDF+SPARQL users who 
>>> choose to use the XMLLiteral
>>> datatype are essentially choosing to store arbitrary XML within their 
>>> RDF, and they are tagging
>>> it as such.  So, if we want to support that use case, allowing the 
>>> <literal> return block
>>> to contain XML, and using the XSD:ANY schema type to implement the 
>>> facility seems appropriate.
>>> I do see that there are some choices which would need to be made in 
>>> the face of the limited
>>> expressiveness of the XML-schema standard, and I won't launch into a 
>>> discussion of those
>>> details unless/until others are interested.
>>>
>>> I can also see that some apparent collision issues could arise if 
>>> literal content uses the
>>> default namespace, but these don't appear insurmountable.  Again, I 
>>> won't
>>> launch into examples until others have had a chance to respond in 
>>> general terms.
>>> (Perhaps this whole issue was already debated somewhere before).
>>>
>>> In summary:  I think that if there was a way for SPARQL engines to 
>>> (optionally) return
>>> XMLLiterals without escaping the tags (and preferably, without 
>>> violating applicable
>>> standards), that would be peachy.
>>>
>>> sincerely,
>>>
>>> Stu Baurmann
>>>
>>>
>>>
>>>
>>
>>
> 
>
Received on Friday, 5 October 2007 02:47:34 UTC