- From: Stu Baurmann <stub@logicu.com>
- Date: Thu, 02 Aug 2007 13:05:48 -0700
- To: public-rdf-dawg-comments@w3.org
Howdy! (I have tried sending this before and it didn't seem to go through. Apologies if you get multiple copies). In September 2006 I posted a question about literal XML inclusion in SPARQL results to the jena-dev list, and Andy suggested that I post the issue here. Sorry it's taken me so long to do that. I don't see anywhere on this list that the subject has come up in the meantime, so perhaps it's still germane. When literal XML is stored inside an RDF model, it is in some cases desirable to fetch that content as part of a SPARQL XML result stream *without escaping*. For example, consider the storage of XHTML content within RDF literals. It seems reasonable (and works fine) to assert a triple like this: s = m:someDocument p = m:hasContent o = <xh:p xh="http://www.w3.org/1999/xhtml">Contents of <xh:em>THE</xh:em> paragraph</xh:p>^^rdf:XMLLiteral Note that the datatype of the object node is rdf:XMLLiteral. Also note that I am only using XHTML as an example, and the literal block could be of any XML type. What I would like is to query a model containing this triple, and receive the results as SPARQL-XML, with the literal's contents simply included into the result stream as XML, so we would see output like this: <binding name="o"> <literal datatype="http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral"> <xh:p xmlns:xh="http://www.w3 .org/1999/xhtml">Contents of <xh:em>THE</xh:em> paragraph</xh:p> </literal> </binding> (I have hacked up my copy of ARQ to do this, and it works great, for my purposes!). This is not the normal behavior of ARQ, however. ARQ will always escape the XML tag characters, turning angle brackets into entity references, and so on. I can see how some users might want that escaping, but it also seems reasonable to NOT want it. Turning the escaped tags back into parsed XML requires a consumer (who shares my assumptions and preferences) to serialize the result set document into a buffer and re-parse it from text, which is not fun or fast. From my use case (embedding RDF technology within an established content management application based on the Cocoon XML-pipeline framework), it is very nice to have the result set available as a single unbroken XML tree, which is immediately ready for downstream processing using XSLT. With this feature available, the embedding of small fragments of XML content within RDF models becomes quite attractive in some situations. To me it would be reasonable to control this behaviour ("to escape or not to escape") at the SPARQL query engine API level, probably by setting a flag on the ResultSet object. I proposed this on the jena-dev list (with the simple implementation that I had hacked up for my own use), and Andy gave a very comprehensive and thoughtful response indicating how this serialization issue relates to the design of the XML Schema for SPARQL results, the defined lexical form of the results in the spec, and concerns about reparsing the literals in downstream processes: http://tech.groups.yahoo.com/group/jena-dev/message/25395 I understand the desire to keep schemas tight and not have gratuitous XSD:ANY's flying around. But, on the other hand, it seems to me that RDF+SPARQL users who choose to use the XMLLiteral datatype are essentially choosing to store arbitrary XML within their RDF, and they are tagging it as such. So, if we want to support that use case, allowing the <literal> return block to contain XML, and using the XSD:ANY schema type to implement the facility seems appropriate. I do see that there are some choices which would need to be made in the face of the limited expressiveness of the XML-schema standard, and I won't launch into a discussion of those details unless/until others are interested. I can also see that some apparent collision issues could arise if literal content uses the default namespace, but these don't appear insurmountable. Again, I won't launch into examples until others have had a chance to respond in general terms. (Perhaps this whole issue was already debated somewhere before). In summary: I think that if there was a way for SPARQL engines to (optionally) return XMLLiterals without escaping the tags (and preferably, without violating applicable standards), that would be peachy. sincerely, Stu Baurmann
Received on Thursday, 9 August 2007 11:50:45 UTC