[Bug 27763] Serialization method for reconstructing serialized query results from bugzilla@jessica.w3.org on 2015-01-07 (public-qt-comments@w3.org from January 2015)

From: <bugzilla@jessica.w3.org>
Date: Wed, 07 Jan 2015 19:48:25 +0000
To: public-qt-comments@w3.org
Message-ID: <bug-27763-523-OTFLQJ2Z94@http.www.w3.org/Bugs/Public/>
https://www.w3.org/Bugs/Public/show_bug.cgi?id=27763

Jim Melton <jim.melton@acm.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
                 CC|                            |jim.melton@acm.org
         Resolution|---                         |LATER

--- Comment #1 from Jim Melton <jim.melton@acm.org> ---
In their Joint teleconference of 2015-01-06, the XML Query WG and XSLT WG
agreed that the feature described in the referenced bug 27498 (repeated below
for organizational purposes) is highly desirable, but also agreed that the
requirement became known too late in the process for Serialization 3.1.  This
bug is being marked RESOLVED/LATER now, and will be immediately re-opened as a
bug against future versions. 

Bug 27498, comments 5-10 are copied here:

 Michael Kay 2015-01-01 11:24:07 UTC

I've just had a request from a Saxon user which suggests an additional
requirement: they are interested in serializing the query result (an arbitrary
XDM value) not for human consumption, but for transmission to a client
application that can reconstruct the XDM value from its serialized form. This
suggests additional requirements such as including the type of an atomic value,
not just its string value.

[reply] [−] Comment 6 Hans-Juergen Rennau 2015-01-02 22:04:07 UTC

(In reply to Michael Kay from comment #5)
> I've just had a request from a Saxon user which suggests an additional
> requirement: they are interested in serializing the query result (an
> arbitrary XDM value) not for human consumption, but for transmission to a
> client application that can reconstruct the XDM value from its serialized
> form. This suggests additional requirements such as including the type of an
> atomic value, not just its string value.

Incidentally, this additional requirement (as well as the need of XDM
serialization in general) has been pleaded for several years by David Lee, e.g.
on the XQuery talk list and at two Balisage conferences, alas, without
receiving any attention or response.

 C. M. Sperberg-McQueen 2015-01-05 18:56:09 UTC

If we adopt round-trippability as a requirement (as implicitly suggested at
least for arrays and maps in comment 5 and endorsed in comment 6), does the
requirement also apply to XML data?

One story that would be simple to tell would be:  serialize it using a new
serialization method, and then you will be able to reconstitute an isomorphic
collection of XDM data from the serialization.  We seem to be missing a couple
of things here:

1 a way to annotate XML nodes with type information that can be reliably
reconstituted (as long as all the appropriate in-scope schema information is
available) -- remember that revalidating with the in-scope schema starting at
the root of each maximal XDM tree is not guaranteed to produce the same
results;

2 a way to read the serialized data and re-type everything the same way.

It's not clear at first glance how best to add the type annotations required
for reliable write + read round tripping for either JSON or XML, without
getting in the way of non-XDM systems.

And it would be nice to be able to serialize the entire collection of data
without loss; but that involves being able to handle parentless attributes and
functions (and possibly other things I'm forgetting at the moment).  Or is
there a plausible subset of XDM for which reliable write + read round-tripping
can be easily defined and which will suffice for all imaginable purposes?  all
rational imaginable purposes?  all rational purposes that don't involve
meta-programming or other unusual or unnatural acts?  most rational purposes? 
many purposes?  

I thank Hans-Jürgen Rennau for pointing to some earlier discussions that have
not been raised as bugs or enhancement requests in Bugzilla; I'll have to
refresh my memory to see if solutions have already been suggested for these
problems.

[reply] [−] Comment 8 Michael Kay 2015-01-05 23:14:30 UTC

I don't think it's difficult to define an XML representation of the full XDM
model, but I doubt it would be very human-readable, so it's a very different
objective from the original requirement of this thread.

Parsing that XML representation to reconstitute the XDM would not be possible
using pure XSLT and XQuery programs because the only way we allow type
annotations to be set is by using validate expressions. But we could define a
magic function to do it.
magic function to do it.

[reply] [−] Comment 9 Christian Gruen 2015-01-06 11:47:37 UTC

I agree that a serialization method that allows users to reconstruct original
query results would be helpful. As the "adaptive" serialization method is
probably not the best target for all that, I have just added a new bug entry
for further discussion (Bug 27763).

[reply] [−] Comment 10 David Lee 2015-01-06 20:55:13 UTC

A few years back I started a discussion on this and created a wiki with quite a
few of these issues.

http://xml.calldei.com/XDMSerialize


Mike encouraged me to start with Use Cases ... and I definitely agree.
For example, a primary use case I have is "streaming" XDM producers.  For
example producing log messages or long lived sessions.
The Efficient XML group (while I was on it) had several real world 'customers'
who needed this as well (but for efficient XML), and the solution is non-ideal.
  One example was for "Instant Message" applications.  each message is an XML
Element but the entire stream is a long lasting document because there is no
standardized way of representing streams of XML documents ... The requirement
is to get each message without over-reading the socket ... That may be a edge
case, but consider typical "feeds' ... twitter, facebook, stocks, news, message
queues. 
Or simply log files ... how to parse a log file before its "done" ... 

This site is still up and discusses many of the use cases I considered.  I put
this on hold when I realized I didn't have a clean solution to item types like
maps or functions, and that some use cases have contradictory requirements such
as full fidelity vs minimal output.
An example - do you really need to expose node identity ? without it you cant
reconstruct the XDM perfectly but is that needed ? for what cases ?


Its good to see some renewed interest in this topic.

If we cant get XDM (of some sort ... ) in and out of our XDM Tools using some
format that has a reasonable chance of being recognized by another tool set 
... that a big barrier ... To me, the "human readable text output" is
interesting but not that problematic as any vendor can solve that differently 
... (humans are tolerant of differences).

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
Received on Wednesday, 7 January 2015 19:48:27 UTC