W3C home > Mailing lists > Public > xproc-dev@w3.org > September 2009

Re: [xml-dev] Serialization of XDM - Use cases / Proposal

From: Kurt Cagle <kurt.cagle@gmail.com>
Date: Sun, 20 Sep 2009 18:13:43 -0700
Message-ID: <6fa681b10909201813m6b69875ct219b9977866f0e3d@mail.gmail.com>
To: "David A. Lee" <dlee@calldei.com>
Cc: Michael Kay <mike@saxonica.com>, rjelliffe@allette.com.au, xml-dev@lists.xml.org, XProc Dev <xproc-dev@w3.org>
Kurt, could you expand on what you think might be the advantages of a format
such as your example ?
> (<?xml version="1.0"
> encoding="UTF-8"?>,"foo",5^positiveInteger,<bar><bat/></bar>,<!-- foo -->)
> I'm not at all opposed to multiple new serialization formats, although I'm
> inclined to think getting *one* more with any decent adoption is a ambitious
> goal, let alone 2.

I figure in for a penny, in for a pound. Sequence serialization, with or
without XDM, is a problem that will only become more prevalent as we start
to use sequences of documents more routinely. I'm thinking especially here
of the XQuery and XProc use-cases, both of which are fully capable of
generating sequences. Changing serialization formats at this stage is a lot
easier because of relatively early adoption of these technologies, but that
will change as both become more commonplace.

> Your example with RNG is interesting, but I don't think its quite a
> parallel.   With RNG the non-xml form is intended to be authored by humans,
> with a design goal of simple human editable representations.   In this case,
> so far none of the design goals (or use cases) I've come up with yet involve
> humans authoring the data.
> In your example, what is the design intention for a non-xml format ?

I'm not necessarily saying that there is one, only that if you introduce a
formal xdm: or xml: notation for handling serialization/deserialization,
this will then need to be processed in some manner in order to create the
internal sequence representation, which implies a pre-process step of some
sort, whether xml or not. The non-xml format has the advantage of
compactness - for some people this is a consideration (for me it probably
wouldn't be, but there are people for whom this is a big factor).

> In my mind, there is one example where non-xml format for sequences would
> be very useful but I'm not satisfied with how it would actually work in
> practice.
> that is, I believe the most common actual production of XDM data happens to
> be either plain text, or a single XML item (element, document).
> In both of those cases it would be really nice if the serialization
> happened to be the 'standard' serialization for those without any kind of
> wrapping at all,
> (no (  )  or no <xdm:wrapper> .. etc)
> That way if you just happened to produce a single XDM Item of type element
> or text there'd be no extra baggage.
> I think that would be really cool.   But the only way I've thought of to
> achieve that is to use a sequence delimited format with no start and end
> markers.

I don't really think that singleton content is that much of an issue. You
can serialize most singletons now without needing any additional content,
but it's worth noting here that such content is, by definition, scalar and
dimensionless. If you had an xdm serialization, it actually might make sense
to have an <xdm:wrapper> around such singletons, if only because this could
be used to provide type information. There is also a distinct difference
between a naked singleton and  singleton entry of a sequence  - the former
would just be the xml representation (with or without the encoding header),
the latter would be an xdm:wrapper (or xdml:sequence) element surrounding
the sequence itself.

> My opinion is that if I'm going to have to parse "(" and "," I'd rather be
> parsing "<wrapper> ... </wrapper>" at least I wouldn't have to write a new
> (if even simple) parser and can simply read it as XML.   For example I would
> like to provide a 'sample implementation' of the serialize and parser
> written in pure XQuery as an additional way of describing the format besides
> prose.
> But perhaps your thinking of a use case or design goal I have neglected.
Not really. Either way, you'd have to define an EXPath (or fill in the blank
standard) set of xdm:serialize() and xdm:parse() in order to track into
internal XDM representations. The XML representations are complicated only
by the fact that there is no consistent serialization or parse mechanisms in
the fn: namepace (eXist's declare-option function would be the closest (and
I think there's something analogous in Mark Logic)) but otherwise you'd have
to manually walk the tree for each serialization.

Received on Monday, 21 September 2009 01:14:23 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 21 September 2009 01:14:25 GMT