- From: Stu Baurmann <stub@logicu.com>
- Date: Thu, 04 Oct 2007 21:47:15 -0500
- To: Lee Feigenbaum <lee@thefigtrees.net>
- CC: public-rdf-dawg-comments@w3.org
Hi Lee,
I read through the group's discussion. I'm glad that some members agree
that this feature would be useful. I understand that there are some design
challenges which will be difficult to resolve cleanly and quickly. I hope
that this feature can be taken up as a scheduled design task in some future
round of the group's activity. Until then, perhaps some implementors
will take the opportunity to offer raw XML output as a non-standard extension,
with appropriate caveats applied (e.g. "this extension only produces
well-formed output if conditions A, B, C hold on the stored XML fragment").
So, yes, I am satisfied, and thanks for considering my suggestion!
peace,
Stu
>> Hi Stu,
>>
>> My apologies for the long delay in responding to your comment.
>>
>> The Working Group discussed your comment on our mailing list and at
>> last week's teleconference[1]. While there is some measure of support
>> for the goals of your suggestion, the combination of schedule concerns
>> with a lack of a mature existing technical design led to the group
>> choosing not to add the possibility for unescaped XML literals in the
>> SPARQL XML result format.
>>
>> To note the issue and to help inform a potential future working group,
>> I opened and immediately postponed an issue[2] regarding unescaped XML
>> in the SPARQL XML result format.
>>
>> Please let us know if you are satisfied with this response to your
>> comment.
>>
>> Lee
>>
>> [1]
>> http://lists.w3.org/Archives/Public/public-rdf-dawg/2007JulSep/att-0175/25-dawg-minutes.html#item02
>>
>> [2] http://www.w3.org/2001/sw/DataAccess/issues#unescapedXml
>>
>> Stu Baurmann wrote:
>>>
>>> Howdy!
>>>
>>> (I have tried sending this before and it didn't seem to go through.
>>> Apologies if you get multiple copies).
>>>
>>> In September 2006 I posted a question about literal XML inclusion in
>>> SPARQL results to the jena-dev list,
>>> and Andy suggested that I post the issue here. Sorry it's taken me
>>> so long to do that. I don't see
>>> anywhere on this list that the subject has come up in the meantime,
>>> so perhaps it's still germane.
>>>
>>> When literal XML is stored inside an RDF model, it is in some cases
>>> desirable to fetch that content
>>> as part of a SPARQL XML result stream *without escaping*. For
>>> example, consider the storage of
>>> XHTML content within RDF literals. It seems reasonable (and works
>>> fine) to assert a triple like this:
>>>
>>> s = m:someDocument
>>> p = m:hasContent
>>> o = <xh:p xh="http://www.w3.org/1999/xhtml">Contents of
>>> <xh:em>THE</xh:em> paragraph</xh:p>^^rdf:XMLLiteral
>>>
>>> Note that the datatype of the object node is rdf:XMLLiteral.
>>> Also note that I am only using XHTML as an example, and the literal
>>> block could be of any XML type.
>>>
>>> What I would like is to query a model containing this triple, and
>>> receive the results as SPARQL-XML,
>>> with the literal's contents simply included into the result stream as
>>> XML, so we would see output like this:
>>>
>>> <binding name="o">
>>> <literal
>>> datatype="http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral">
>>> <xh:p xmlns:xh="http://www.w3
>>> .org/1999/xhtml">Contents of <xh:em>THE</xh:em> paragraph</xh:p>
>>> </literal>
>>> </binding>
>>>
>>> (I have hacked up my copy of ARQ to do this, and it works great, for
>>> my purposes!).
>>>
>>> This is not the normal behavior of ARQ, however. ARQ will always
>>> escape the XML tag characters, turning angle brackets into entity
>>> references, and so on.
>>>
>>> I can see how some users might want that escaping, but it also seems
>>> reasonable to NOT want it.
>>> Turning the escaped tags back into parsed XML requires a consumer
>>> (who shares my assumptions
>>> and preferences) to serialize the result set document into a buffer
>>> and re-parse it from text, which
>>> is not fun or fast. From my use case (embedding RDF technology
>>> within an established content
>>> management application based on the Cocoon XML-pipeline framework),
>>> it is very nice to have the
>>> result set available as a single unbroken XML tree, which is
>>> immediately ready for downstream
>>> processing using XSLT. With this feature available, the embedding of
>>> small fragments of XML
>>> content within RDF models becomes quite attractive in some situations.
>>>
>>> To me it would be reasonable to control this behaviour ("to escape or
>>> not to escape") at the SPARQL
>>> query engine API level, probably by setting a flag on the ResultSet
>>> object. I proposed this on the
>>> jena-dev list (with the simple implementation that I had hacked up
>>> for my own use), and Andy gave a
>>> very comprehensive and thoughtful response indicating how this
>>> serialization issue relates to
>>> the design of the XML Schema for SPARQL results, the defined lexical
>>> form of the results
>>> in the spec, and concerns about reparsing the literals in downstream
>>> processes:
>>>
>>> http://tech.groups.yahoo.com/group/jena-dev/message/25395
>>>
>>> I understand the desire to keep schemas tight and not have gratuitous
>>> XSD:ANY's flying around.
>>> But, on the other hand, it seems to me that RDF+SPARQL users who
>>> choose to use the XMLLiteral
>>> datatype are essentially choosing to store arbitrary XML within their
>>> RDF, and they are tagging
>>> it as such. So, if we want to support that use case, allowing the
>>> <literal> return block
>>> to contain XML, and using the XSD:ANY schema type to implement the
>>> facility seems appropriate.
>>> I do see that there are some choices which would need to be made in
>>> the face of the limited
>>> expressiveness of the XML-schema standard, and I won't launch into a
>>> discussion of those
>>> details unless/until others are interested.
>>>
>>> I can also see that some apparent collision issues could arise if
>>> literal content uses the
>>> default namespace, but these don't appear insurmountable. Again, I
>>> won't
>>> launch into examples until others have had a chance to respond in
>>> general terms.
>>> (Perhaps this whole issue was already debated somewhere before).
>>>
>>> In summary: I think that if there was a way for SPARQL engines to
>>> (optionally) return
>>> XMLLiterals without escaping the tags (and preferably, without
>>> violating applicable
>>> standards), that would be peachy.
>>>
>>> sincerely,
>>>
>>> Stu Baurmann
>>>
>>>
>>>
>>>
>>
>>
>
>
Received on Friday, 5 October 2007 02:47:34 UTC