Re: Fwd: LOPT: serialization algorithm suggestions from Eric Prud'hommeaux on 2000-04-17 (xml-dist-app@w3.org from April 2000)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Mon, 17 Apr 2000 17:52:50 -0400
To: Constantine Plotnikov <cap@mail.novosoft.ru>
Cc: xml-dist-app@w3.org
Message-ID: <20000417175249.A27982@w3.org>
On Mon, Apr 17, 2000 at 04:30:47PM +0700, Constantine Plotnikov wrote:
> Eric Prud'hommeaux wrote:
> > 
> > On Fri, Apr 14, 2000 at 04:17:04PM +0700, Constantine Plotnikov wrote:
> > > Hi!
> > >
> > > 1. Could you please reconsider serialization algorithm?
> > > As for as I understand from your site the SOAP is starting
> > > point for LOPT development.
> > >
> > > We had some problems with implementing soap protocol in java.
> > > The algorithm require two passes for serialization and
> > > deserialization.
> > >
> > > The basic idea I suggest is the same as in java and XMI 1.1
> > > serialization algorithm.
> > >
> > > When object is serialized, it is assgined id and it is written
> > > as:
> > > <Type id="id0" >
> > >   // contents
> > > </Type>
> > >
> > > Later (in the body of the the element or or ), when reference is
> > > encountered, empty element with href is used.
> > >
> > > <List t="i0">
> > >   <Type id="i1" >
> > >     <value>
> > >       <Element id="i2">
> > >         <parent>
> > >           <Type href="i1"/>
> > >         </parent>
> > >       <Element id="i2">
> > >     </value>
> > >   </Type>
> > >   <Type id="id0"/>
> > > </List>
> > >
> > > It allow single pass serialization/desirailaization and references
> > > to parent. I do not suggest to use exactly this representation
> > > for protocol. For example XMI 1.1 like optimization for representation
> > > of values may be used. I just want to make (de)serialization simple and
> > > single pass.
> > 
> > I'm very interested in building on object model that supports graphs
> > without reading a supporting schema. Let's take a regorous example in
> > C. (I don't use Java because it glosses over the distinction between
> > pointers and nested data.)

First of all, let me stress that I'm interested in, but not convinced
about, the utility of this structural serialization. My hope is that
the serialization mechanism will be useful in more areas than RPC. If
the cost of being able to specify the actual structure is minimal in
situations where it is not needed, it may be worth stating the
serialization in a basic serialization protocol.

I have written a mechanism like this into LOTP
[http://www.w3.org/2000/03/31-LOTP-Architecture] for use as a strawman
in pro/con discussions.

> > struct {
> >   int i;
> >   char c;
> > } t_Foo;
> > 
> > struct {
> >   char * str;
> >   t_Foo * fooPointer;
> > } t_Bar;
> > 
> > struct {
> >   char * str;
> >   t_Foo nestedFoo;
> > } t_Baz;
> > 
> > t_Baz myBaz = {"there", {9, 'c'}};
> > t_Bar myBar = {"hi", &myBaz.nestedFoo};
> > 
> > If we use a something like a hashtable to tell which objects we've
> > serialized, and we are called on to serialize(&myBar, &myBaz), we can
> > write the myBar structure (ignoring SOAP serialization for now):
> > 
> > <t_Bar LOTP:name="myBar">
> >   <XMLSchema:string LOTP:name="str">hi</XMLSchema:int>
> >   <t_Foo LOTP:type="pointer" LOTP:objectID="t_Foo_0">
> >     <XMLSchema:int LOTP:name="i">9</XMLSchema:int>
> >     <XMLSchema:char LOTP:name="c">c</XMLSchema:int>
> >   </t_Foo>
> > </t_Bar>
> > 
> > <t_Baz LOTP:name="myBaz">
> >   <XMLSchema:string LOTP:name="str">there</XMLSchema:int>
> >   <t_Foo LOTP:objectID="t_Foo_0"/>
> >   </t_Foo>
> > </t_Bar>
> > 
> > I used objectID to make sure it was clear I was identifying the object
> > being generated, not the XML element where it happened to be
> > serialized. This XML element may have another name to make it
> > available to XSLT or something like that.
> > 
> > I'll flush this example out with actual XML schema conformance and
> > other tasty tidbits.
> > 
> > This example would have been more convient if we were suppposed to
> > serialize(&myBaz, &myBar) as the t_Foo is actually nested in t_Baz,
> > but that wouln't be as rigorous.
> >
> serialize(&myBaz, &myBar) withot schema would be very difficult
> C does not have reflective facilites like Java. How serialize()
> will learn what it is serializing and its structure. 

I had imagined that the object serializing the data would be
hard-coded to the structure, and that the generality of expression
would payoff when a generic receiving agent was able to construct the
data objects to pass to a handler. The handler would know that the
memory image it was passed was actually a t_Bar and say:

LOTPResult LOTP_Bar_Handler (LOTPContext * c, int argc, void * argv[]) {
    t_Bar * myBar = (t_Bar *)argv[0];
    LOTPDeferred * reply = new LOTPDeferred("I'll get back to you.");
    return reply;
}

> What you are suggething is possibly some sort of security hole. 
> The detail of embeding is pretty low level and RPC protocols usually 
> do not consider such details. I looks like that in you example you 
> talking not about objects, their references and their representation 
> in protocol, but about memory, pointers and their representation 
> in protocol. I do not think that it will be easy task to prove 
> security properties of anything that work with pointers.
> 
> I think that this feature would not be needed for Java, Scheme, 
> Smalltallk, Prolog to name few that I know well. My knowledge of 
> perl and ASP basic is more limited, and I would ask other to 
> comment on it. I would like to see good practical example where 
> this feature will bring significant benefits.

I had not considered proofs while writing up this idea. One possible
solution to this is to say "There is a cononical way to serialize
typed memory objects. The agent may have a site policy that does not
permit their use and may not even implement their functionality. Agents
written in reference-based languages will likely elect not to process
these idioms.

I beleive that phrasing it this way drops the incremental
implementation cost to near zero for most platforms.

> I can say nothing of RPC protocol that was named "RPC" (it had 
> 64bit ports that sould be reserved and the prots are published in 
> RFCs) because I do not have experience with it. It was acessible 
> mainly from C so it might have some hacks to address the issue. 
> If someone has worked with it, please tell us aout it.

Indeed - I'm interested in opinions and experience here.

> But in all other RPC system I have seen objects were of two kinds 
> values and references to remote objects. The requests were isolated 
> and it was not possible to reference nonremote objects that are 
> outside of request. Maybe C CORBA interface would be good place
> to study these issues more for C (I have worked with CORBA from 
> C++ and Java).

[http://www.w3.org/2000/03/31-LOTP-Architecture#serialization_identifiers_objectHref]
addresses the use of URIs to identify objects within or outside of the
document.

> Constantine
> 
> BTW there are a lot of people were in CC. I think that some 
> of them are subscribed to xml-dist-app@w3.org. I copied cc 
> exactly for now becuse I do not know why it was done, but do 
> they need to receive it twice?

I poked through the dist list - they were all subscribed.

-- 
-eric

(eric@w3.org)
Received on Monday, 17 April 2000 17:52:54 UTC