W3C home > Mailing lists > Public > public-rdf-dawg@w3.org > July to September 2007

Re: Re: Unescaped XML in the SPARQL XML Result Format and Tuesday's agenda

From: Ivan Mikhailov <imikhailov@openlinksw.com>
Date: Sun, 23 Sep 2007 14:04:23 +0700
To: Eric Prud'hommeaux <eric@w3.org>
Cc: Lee Feigenbaum <lee@thefigtrees.net>, 'RDF Data Access Working Group' <public-rdf-dawg@w3.org>, Thomas Roessler <tlr@w3.org>
Message-Id: <1190531063.7678.354.camel@master.iv.dev.null>

Hi everyone,

I absolutely agree that we can't place unescaped the whole XML document
with things like version declaration, doctypedecl and DTD. RDF/XML does
not contain any special support for complete XML documents too.

My intention is to allow XML content, as it is described by
http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#dfn-rdf-XMLLiteral,
inside <literal>...</literal>. Not more. This may result in XML schema
validation problems if the XML literal value is in turn invalid SPARQL
Results XML, but I don't really care. The problem of duplicated IDs is
eliminated by itself because when there's no DTD data there's no
attribute types.

The example of Stu Baurmann is

<binding name="o">
         <literal
datatype="http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral">
                  <xh:p
xmlns:xh="http://www.w3 .org/1999/xhtml">Contents of <xh:em>THE</xh:em>
paragraph</xh:p>
         </literal>
</binding>

It's almost as it should be, but I'd like to have an additional
attribute in <literal> to indicate that the content is XML, to make
unambiguous difference between XML content and plain strings.

The introduction of this syntax does not mean that any typed value of
type http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral MUST be
written unescaped. Tools that does not know anything about XML should be
able to form the result by escaping string values of literals.

I'm sorry I did not rise this issue myself months ago. I've seen
mixed="true" in XML Schema of the format and inaccurately decided that
'mixed' allows arbitrary XML elements in the content whereas it allows
only text and child XML elements of the complex type.

Best Regards,
Ivan Mikhailov.

On Sat, 2007-09-22 at 15:56 -0400, Eric Prud'hommeaux wrote:
> * Ivan Mikhailov <imikhailov@openlinksw.com> [2007-09-22 12:56+0700]
> > 
> > Hi everyone,
> > 
> > I vote for support of unescaped XML texts in the ..Results XML...
> > because it could be convenient for XSLT and similar tools that may be
> > used to transform result sets of different formats into each other. It
> > is also definitely more readable. It also resembles RDF/XML decision.
> > 
> > There should be an attribute that will indicate the difference between a
> > string and an XML entity that consists of a string. I'm in doubt whether
> > we should support generic entities there or just XML trees, so  probably
> > we should repeat RDF/XML decision.
> > 
> > I understand that unescaped XML texts may add problems for some
> > lightweight parsers of the format but these problems are minor and not
> > common for all implementations whereas convenient report format is a
> > worth thing for everybody.
> > 
> > I also understand that this will 'relax' XML Schema of the document but
> > I don't care :)
> 
> There are a few issues that make arbitrary XML not embeddable in other
> XML:
>   1. XMLDecl and doctypedecl may only appear in the prolog
>   2. embedding some XML1.1 forces the encapsulating VersionNum to 1.1
>   3. XML IDs can collide, even likely to on many reallistic queries
>   4. XMLLiteral leans on c14n 1.0, which is being updated.
> 
> http://www.w3.org/TR/xml11/#NT-XMLDecl
> http://www.w3.org/TR/xml11/#NT-doctypedecl
> http://www.w3.org/TR/xml11/#NT-prolog
> 
> While I'd like to be able to look for
>   /sparql/results/result/binding[@name="doc"]/html/head/title
> , I'm sure it would bring us past the end of our extension, and may take
> a very long time. Ultimately, I'd like XML Core or c14n to really take
> up encapsulation so that specs like ours can lean on it. I don't have
> an reason to believe this stuff will be solved in the next couple years.
> You might think that XQuery would have had to address this, but their
> data model allows encapsulations that can't be serialized. If they can't
> do it, I don't even want to try. My cowardly vote is to go to CR Tuesday.
> 
> I've Cc'd Thomas Roessler in cause he has relevent c14n info.
> 
> > Best Regards,
> > Ivan Mikhailov.
> > 
> > On Sat, 2007-09-22 at 00:58 -0400, Lee Feigenbaum wrote:
> > > Hi everyone,
> > > 
> > > Eric is at risk for Tuesday; Orri and Ivan M can't make it, and I have 
> > > schedule-crunch. We're still doing well towards a decision to move to 
> > > PR, but I think we might shorten up this Tuesday's teleconf and push the 
> > > meat of our work to a week from Tuesday.
> > > 
> > > We do have one issue that we need to tackle ASAP:
> > > 
> > > On August 2, we received a comment from Stu Baurmann:
> > > 
> > > http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2007Aug/0005.html
> > > 
> > > The message brings up the possibility of including unescaped XML literal 
> > > values in the SPARQL Query Results XML Format. Although Richard Newman 
> > > responded with some technical concerns about the suggestion, the Working 
> > > Group never responded. We owe Stu a response before publishing a CR 
> > > version of the XML results format.
> > > 
> > > I'd like to know if there is anyone on the working group who would like 
> > > to consider this suggestion and propose a design for it. I know that 
> > > Andy had some technical concerns about it and there are also, of course, 
> > > schedule concerns, but in the interest of due diligence I wanted to give 
> > > working group members who might support this comment a chance to speak up.
> > > 
> > > So please register your support or active lack of support on the mailing 
> > > list if you can, and we'll attempt to dispatch of the comment on 
> > > Tuesday's teleconference.
> > > 
> > > For Tuesday, I'm picturing taking up this issue and then going over 
> > > where we stand in terms of advancing all three of our specifications to 
> > > PR, and seeing who has what actions on the critical path between here 
> > > and there. I'm hoping to keep the call to 30 minutes.
> > > 
> > > The flip side is that I'm expecting a somewhat lengthy call the week 
> > > after -- probably on the order of 90 minutes. Please let us know as soon 
> > > sa you can if you cannot make our call on Oct 2.
> > > 
> > > Lee
> > > 
> > 
> 
Received on Sunday, 23 September 2007 07:09:59 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:15:37 GMT