Re: Returning un-escaped XML literals in SPARQL 1.1 XML results

Stu Baurmann <stub@logicu.com> writes:
> In response to the solicitation of suggestions for new features in SPARQL 1.1,
> I would like to raise this horse from the dead for further beatings:
>
> http://www.w3.org/2001/sw/DataAccess/issues#unescapedXml

I think I've made my feelings about escaped markup pretty clear:

  http://norman.walsh.name/2003/09/16/escmarkup

:-)

> I also think it is now relevant to consider the impact on XProc integration,
> as raised by Paul Tyson on 2009-03-04.   I say this without being well
> versed in XProc, but based on the assumption that un-escaped XML results
> are useful in any pipelined processing context.   I welcome clarifications
> from the more XProc-savvy.

If you've got XML and you want to pass XML through an XML pipeline,
starting with escaped XML is a damned inconvenience.

That said, XProc does have a step for unescaping markup, so it's not
fair to say that you can't deal with escaped markup (at least in XProc
pipelines).

But I'd much rather have a way of storing it unescaped, thank you very much.

> I know there is some complexity involved in embedding arbitrary
> XML into the results stream.   It might be sensible to make xml-literal
> results an optional feature (both in the sense that SPARQL implementors
> are not required to implement it, and in the sense that SPARQL
> users are not required to use it).  I would also support placing
> restrictions on the XML content that can be returned this way,
> e.g. to address some of the encoding issues addressed by Eric
> Prud'hommeaux here:
>
> http://lists.w3.org/Archives/Public/public-rdf-dawg/2007JulSep/0163.html
>
> (Perhaps there's been some progress on c14n in recent months?)

I assume that this embedding is happening at the Infoset level (or in
some other data model abstraction), so losing the XML Declaration
isn't likely to be too problematic (XML 1.1 notwithstanding).

Yes, you have to lose the <!DOCTYPE declaration. So be it. I think
this effort should be described in terms of embedding XML content in
RDF, not in terms of embedding XML *documents* in RDF. Presumably you
can use some other triple to keep track of what its DTD was, and for
that matter what version of XML it was, if those things are important
to you.

Yes, xml:ids can collide. That's ok, the xml:id spec says they can
collide too. It means that you'll get funny results (potentially) if
you do id() queries on an XML serialization of your RDF store. But
really, you're going to get funny results anyway if you do that,
right?

> Regarding XML schemas and implementation, one idea is that
> the XML literal might come wrapped in a child tag of <binding> called
> <xml-literal>, which has content type xsd:any.
> This means the overall SPARQL-results schema would not be
> weakened for any results that do not happen to include <xml-literal>.

Technically, the XML Core WG own's all element names that begin with
"x", "m", and "l" in that order, so some coordination might be
necessary (not that I expect it would be difficult).

> Example of well-formed XHTML content (we could just as well use WSDL,
> a SOAP message, etc.):
>
> <binding name="o">
>   <xml-literal datatype="http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral">
>      <xh:p xmlns:xh="http://www.w3.org/1999/xhtml">Contents of <xh:em>important</xh:em> paragraph</xh:p>
>   </xml-literal>
> </binding>

I'm not close enough to SPARQL to have a good grasp on the relationship between
binding and xml-literal, but

  <xml-literal datatype="http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral">

looks a little redundant. Wouldn't simply

  <literal datatype="http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral">

be sufficient?

                                        Be seeing you,
                                          norm

-- 
Norman Walsh <ndw@nwalsh.com> | Kinship is healing; we are physicians
http://nwalsh.com/            | to each other.--Oliver Sacks

Received on Tuesday, 10 March 2009 20:11:18 UTC