W3C home > Mailing lists > Public > xproc-dev@w3.org > September 2009

Re: [xml-dev] Serialization of XDM - Use cases / Proposal

From: Kurt Cagle <kurt.cagle@gmail.com>
Date: Sun, 20 Sep 2009 16:17:05 -0700
Message-ID: <6fa681b10909201617h657b33bdk109e6b542e34be4a@mail.gmail.com>
To: Michael Kay <mike@saxonica.com>
Cc: rjelliffe@allette.com.au, xml-dev@lists.xml.org, XProc Dev <xproc-dev@w3.org>
I'm not unaware of most of the implications of this format, but I still
think it's one that's worth thinking on.

For purposes of discussion, suppose that you arbitrarily split sequence
serialization from single-item serialization into non-XML formats because I
believe they are actually qualitatively different problems. Referring only
to the sequence serialization side of the problem here, I think the question
is whether XML sequence serialization and parsing has to in fact be
consumable by an XML parser. As I see it, you either end up specifying some
arbitrary set of privileged xml sequence tags:

<?xml version="1.0" encoding="UTF-8"?>
<xml:sequence xmlns:xs="http://www.w3.org/2001/XMLSchema">
    <xml:item value="foo" type="xs:string"/>
    <xml:item value="5" type="xs:positiveInteger"/>
    <xml:item type="document"><bar><bat/></bar></xml:item>
    <xml:item type="comment">foo</xml:item>

or you work with a direct serialization as described earlier, possibly with
RDF encodings for type:

(<?xml version="1.0"
encoding="UTF-8"?>,"foo",5^positiveInteger,<bar><bat/></bar>,<!-- foo -->)

Non-native-xml items, such as binary classes invoked through extensions in
XQuery or XSLT, would be a more complex proposition, but otherwise I don't
really see where you'd have that much trouble with the notation. It would
require a modification of any XDM aware application to handle the latter,
but I don't necessarily see that as being that major an issue at this stage.

I could see this approach mirroring the approach that RNG utilizes -
providing two equivalent representations, one in XML, the other as a compact
notation. The serializer in this case would work the way it always does -
you would describe the sequence serialization method and possibly content
type, and make a distinction between xsx - xml serialization - and xsc -
compact notion serialization.

Kurt Cagle
Managing Editor

On Sun, Sep 20, 2009 at 2:29 PM, Michael Kay <mike@saxonica.com> wrote:

>  I'm going to ask what may be an obvious question, but wouldn't it make
> sense for a serialization of a sequence to correspond on the output to the
> serialization on the input? That is to say, if you had a structure:
> ("foo",5,<bar><bat/></bar>,<!-- foo -->)
> The main disadvantage of such a format is that it uses non-XML markup
> (parentheses and commas) which makes it difficult to parse using tools that
> are specialized to handling XML markup, for example XSLT and XQuery.
> Also, it doesn't solve the problem of retaining type annotations, for
> example the difference between the integer 5 and the positiveInteger 5.
> Regards,
> Michael Kay
> http://www.saxonica.com/
> http://twitter.com/michaelhkay
Received on Sunday, 20 September 2009 23:17:46 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:03:05 UTC