Re: [xml-dev] Serialization of XDM - Use cases / Proposal from David A. Lee on 2009-09-29 (xproc-dev@w3.org from September 2009)

From: David A. Lee <dlee@calldei.com>
Date: Tue, 29 Sep 2009 10:31:45 -0400
To: Michael Kay <mike@saxonica.com>
CC: "'Philippe Poulard'" <philippe.poulard@sophia.inria.fr>, "'Kurt Cagle'" <kurt.cagle@gmail.com>, rjelliffe@allette.com.au, xml-dev@lists.xml.org, "'XProc Dev'" <xproc-dev@w3.org>
Message-ID: <4AC21A51.4040604@calldei.com>

Thanks Michael.  In my mind, for this spec,  ease of implementation 
trumps borrowing from existing specs if you cant in fact reuse existing 
technology.
e.g from what I can see
  <xsl:sequence select="xs:positiveInteger('5')"/>

Has the advantage of borrowing from existing specs (xslt) but is 
overweighted by the complexity of implementing the parser for say
    "xs:positiveInteger('5')"

Not only would one have to write a parser for
       <TYPE> '(' value ')'

(not too hard)
But because it is borrowing from existing specs the implication would be 
that we'd have to parse *anything* that could otherwise be in select="..."
That would require a full blown XPATH 2.0 parser.

Now of course we could refine the spec to limit the subset of XPATH 2.0 
which is allowed to be in the select="" to only a small set of lexical 
elements.
But then we're diverging from the original purpose of borrowing from 
existing specs which is that they are familiar, we dont have to 
re-document them, and they mean in this new spec fundamentally what they 
meant in the spec were borrowing from.

Given that I'm going to suggest atomic values be represented as a new 
element like
    <atomicValue value="5" type="xs:positiveInteger"/>.    

But try to reuse the other suggested XSLT elements for documents, 
elements etc.
All in a new namespace .. (because they are not actually the same thing 
as xsl: even if they are based on it).
I'm a bit on the fence if atomic values should have the value as an 
attribute or body.
Attributes make sense for small values like the above, but what a very 
common case of huge text.

*<atomicValue value="This is a huge block of text
....
1000000 lines later"  type="xs:string" />*

Gives me the impression the value should be in the body.

*<atomicValue type="xs:string">This is a huge block of text
....
1000000 lines later</atomicValue>
*

Could always support both variants :(

David A. Lee
dlee@calldei.com  
http://www.calldei.com
http://www.xmlsh.org
812-482-5224

Michael Kay wrote:
>
> The first question I have in mind is how do we parse this.  This one 
> example of Michaels has me a little confused:
>
>     <xsl:sequence select="xs:positiveInteger('5')"/>
>
> This is the proposal for how to represent a typed atomic value.    
> This is pretty obscure to my novice eyes.  Reading this I wouldn't 
> guess off hand that this means "Atomic value, type xs:positiveInteger, 
> text value '5'". 
>  
> Well, I think it's merit is that it's familiar syntax and semantics 
> for anyone who knows XSLT 2.0 or wants to go and read the spec. For 
> many purposes, however, a more convenient syntax would be <atomicValue 
> value="5" type="xs:positiveInteger"/>.  
>
>
> That then leads me to the final question.  Suppose we transform this 
> serialized form "almost an xslt" format, into "real xslt" format, then
> run a real XSLT 2.0 parser on it.  How to get the resulting values out ?
>
> Please bear with me as I'm very much a novice at XSLT ... maybe the 
> answer is "obvious".
> XSLT 2.0 claims that the result of an XSLT transformation can be a 
> 'set of result trees'.
> Thats an XDM sequence . (???)
>  
> No: XQuery can produce any XDM sequence as output (well, almost any - 
> it can't for example generate unparsed entities); but XSLT can only 
> produce a set of document nodes. You can write an xsl:function to 
> produce any XDM sequence as its result, but you would need a 
> processor-specific way of invoking the function and capturing its 
> result in the external environment.
>  
> Incidentally, I was reminded of this project in some work with a 
> client yesterday. They are running MarkLogic queries and feeding the 
> result into Saxon, currently via lexical XML (it has to be serialized 
> because it's on a different machine). In this kind of scenario it 
> would be nice to transfer a typed document, but we really don't want a 
> five-fold increase in document size over the lexical XML. Size does 
> matter.
>
> Regards,
>
> Michael Kay
> http://www.saxonica.com/
> http://twitter.com/michaelhkay
>

Received on Tuesday, 29 September 2009 14:45:18 UTC