Re: Unescaped XML in the SPARQL XML Result Format and Tuesday's agenda from Ivan Mikhailov on 2007-09-23 (public-rdf-dawg@w3.org from July to September 2007)

From: Ivan Mikhailov <imikhailov@openlinksw.com>
Date: Sun, 23 Sep 2007 15:02:05 +0700
To: "Seaborne, Andy" <andy.seaborne@hp.com>
Cc: Lee Feigenbaum <lee@thefigtrees.net>, 'RDF Data Access Working Group' <public-rdf-dawg@w3.org>
Message-Id: <1190534525.7678.372.camel@master.iv.dev.null>
Andy,

> The <a> is in the default namespace of 
> "http://www.w3.org/2005/sparql-results#" so it's
>    {http://www.w3.org/2005/sparql-results#}a
> which is not HTML nor XHTML.

You're right. But this does not mean that the idea is bad, it means that
you've reported a bug in Virtuoso and now I've committed a fix. No
default namespace at top levels of the result document -- no problem.

> 2: HTML need not be well-formed XML

I do not have an intention to place HTML there, I just want to allow as much XML as allowed in RDF/XML literal parse mode.

> 3: It can't be validated any more.

I agree. OTOH lack of validation did not stop RDF/XML developers.

> 4: Escaping in lexical forms is still needed

Of course any XML output will require escaping of weird strings. But if
we require escaping of whole XML trees then double escaping of same
literal forms will be even more weird.

Now we have some about 10 implementations of SPARQL processors. We
intend to create a format that will be used worldwide by thousands of
developers. The difference in orders of magnitude means that our
personal inconveniences with adjusting implementations simply do not
matter. Moreover, if an implementation writes escaped XML text it is
still OK, it's enough to be able to read unescaped XML made by others.

Best Regards,
Ivan Mikhailov.


On Sat, 2007-09-22 at 19:00 +0100, Seaborne, Andy wrote:
> Ivan,
> 
> I think it's more complicated that your message suggests.  Simple embedding of 
> XML in SPARQL results does not work and a full solution is quite intricate and 
> makes at least one significant trade-off (loss of schema validation).
> 
> The SPARQL Query Results XML Format might start like this example from 
> Virtuoso which sends literals just as the lexical form:
> 
> <sparql
>   xmlns="http://www.w3.org/2005/sparql-results#"
>   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>   xsi:schemaLocation="http://www.w3.org/2001/sw/DataAccess/rf1/result2.xsd">
> 
> so the default namespace is http://www.w3.org/2005/sparql-results#
> 
> Later on in the results, I have found examples such as this:
> 
>      <binding name="z"><literal><a
> href="http://woodcutters.meard.org/gallery/v/IMG_3999-woodcutter10.jpg.html"><img
> border="0"
> src="http://woodcutters.meard.org/gallery/d/41-2/IMG_3999-woodcutter10.jpg"
> width="100" height="150" /></a></literal></binding>
> 
> which is supposed, I guess, to be embedded HTML in the results format
> 
> 
> 1: Namespaces
> 
> The <a> is in the default namespace of 
> "http://www.w3.org/2005/sparql-results#" so it's
>    {http://www.w3.org/2005/sparql-results#}a
> which is not HTML nor XHTML.
> 
> Similarly, xml:lang would be passed into the subtree.
> 
> This affects namespace-aware XSLT processing.
> 
> 2: HTML need not be well-formed XML
> 
> One motivation is the use with HTML but HTML can't be simple placed textually 
> because it isn't necessary well-formed XML.
> 
> That "<a>...</a>" was a plain string, no datatype in the <literal> so it has 
> to be escaped anyway.  Even if we allowed nested XML, it would still would not 
> be valid unless we go beyond the XML literal style and allowed mixed text and 
> content.
> 
> It might have been "Here is a <a href="">link</a>."
> 
> 3: It can't be validated any more.
> 
> This was one of the original goals of the design - to have valid XML documents 
> with respect to a fixed schema.  Then, receiving systems could validate the 
> XML in the usual way for an XML processing pipeline.
> 
> This would be a significant loss as evidenced by the XML syntax for RDF.
> 
> Going for a modular XML design would take a long time to finish and I don't 
> see how it would not break deployed systems.  It would impose significant 
> complexity to the parsing process.
> 
> 4: Escaping in lexical forms is still needed
> 
> Suppose we have a literal which is "x<y"   The < needs encoding as &lt; 
> regardless of allowing unescaped XML.  A writer already needs to check for 
> escapes.
> 
> 
> I do agree about readability though. :-)
> 
> I do not see any new information and do not support reopening the design of 
> the SPARQL Query Results XML Format.
> 
> 	Andy
> 
> 
> Ivan Mikhailov wrote:
> > Hi everyone,
> > 
> > I vote for support of unescaped XML texts in the ..Results XML...
> > because it could be convenient for XSLT and similar tools that may be
> > used to transform result sets of different formats into each other. It
> > is also definitely more readable. It also resembles RDF/XML decision.
> > 
> > There should be an attribute that will indicate the difference between a
> > string and an XML entity that consists of a string. I'm in doubt whether
> > we should support generic entities there or just XML trees, so  probably
> > we should repeat RDF/XML decision.
> > 
> > I understand that unescaped XML texts may add problems for some
> > lightweight parsers of the format but these problems are minor and not
> > common for all implementations whereas convenient report format is a
> > worth thing for everybody.
> > 
> > I also understand that this will 'relax' XML Schema of the document but
> > I don't care :)
> > 
> > Best Regards,
> > Ivan Mikhailov.
> > 
> > On Sat, 2007-09-22 at 00:58 -0400, Lee Feigenbaum wrote:
> >> Hi everyone,
> >>
> >> Eric is at risk for Tuesday; Orri and Ivan M can't make it, and I have 
> >> schedule-crunch. We're still doing well towards a decision to move to 
> >> PR, but I think we might shorten up this Tuesday's teleconf and push the 
> >> meat of our work to a week from Tuesday.
> >>
> >> We do have one issue that we need to tackle ASAP:
> >>
> >> On August 2, we received a comment from Stu Baurmann:
> >>
> >> http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2007Aug/0005.html
> >>
> >> The message brings up the possibility of including unescaped XML literal 
> >> values in the SPARQL Query Results XML Format. Although Richard Newman 
> >> responded with some technical concerns about the suggestion, the Working 
> >> Group never responded. We owe Stu a response before publishing a CR 
> >> version of the XML results format.
> >>
> >> I'd like to know if there is anyone on the working group who would like 
> >> to consider this suggestion and propose a design for it. I know that 
> >> Andy had some technical concerns about it and there are also, of course, 
> >> schedule concerns, but in the interest of due diligence I wanted to give 
> >> working group members who might support this comment a chance to speak up.
> >>
> >> So please register your support or active lack of support on the mailing 
> >> list if you can, and we'll attempt to dispatch of the comment on 
> >> Tuesday's teleconference.
> >>
> >> For Tuesday, I'm picturing taking up this issue and then going over 
> >> where we stand in terms of advancing all three of our specifications to 
> >> PR, and seeing who has what actions on the critical path between here 
> >> and there. I'm hoping to keep the call to 30 minutes.
> >>
> >> The flip side is that I'm expecting a somewhat lengthy call the week 
> >> after -- probably on the order of 90 minutes. Please let us know as soon 
> >> sa you can if you cannot make our call on Oct 2.
> >>
> >> Lee
> >>
> > 
> > 
>
Received on Sunday, 23 September 2007 08:02:25 UTC