Compound Transactions,Documents,Streams,Proxies

Some thoughts

Separating Transactions From Content Delivery

In comparison to content delivery over HTTP, FTP and P2P methods, web
service transaction protocols are a less efficient means for delivering
large or complex content. Internet connections are not wholly reliable
and large file transfers regularly fail. In comparison to resuming an
HTTP,FTP or P2P download, it is inefficient to repackage the content in
a new SOAP transaction or expect the web service server to hold on to
the transaction waiting for client reconnection. It is better to
separate the transaction from the content delivered, by  passing URIs
and maybe decryption keys in the web service transaction, letting the
client system fetch the content. Using straight HTTP also allows ISPs
and organizations take advantage of transparent proxy caching. Using P2P
allows content providers to greatly reduce bandwidth costs.  

Compound Documents Revisited

Embedding binary content inside text based XML is wasteful and even text
based content embedded within XML requires some process to embed or
extract the embedded content. Why not just keep content separate at the
packaging level. One solution is to ship an archived directory of files
in a binary format, for example a zipped packaged directory. This
solution was suggested back in 1999.

http://lists.xml.org/archives/xml-dev/199902/msg00101.html

It is easy to "peer into" and "grab" the content of a zip file,
Java classes and C libraries [and Apache modules] that can do this 
already exist.

When the www-xml-packaging group formed in July 2000, after a little
prompting ...

http://lists.w3.org/Archives/Public/www-xml-packaging/2000Jul/0004.html

... a zip/jar type archive solution faced little real competition from
similar schemes that recode and embed binary content in XML.

In fact, the zipped/jarred compound document was the solution adopted
for all of Sun's OpenOffice.org / StarOffice document formats.

Compound Streams Introduced

In cases where the embedded content is generated by the same process, or
the content is better served in a timely manner, ie streamed
audio/video, then why not create a multichannel binary stream-able
format, like Vorbis's OGG format, to carry the content over HTTP or any
other existing streaming protocol. Once delivered the resulting single 
file could be cached or saved on the client side.

Implementing Compound Documents and Streams for Client Side Web Browsers

Any introduction of a compound document format or compound stream format
would require either modification of client side browser or the use of a
proxy server which expands and separates the content and delivers it to
the conventional client. 

For both the Jar'ed/Zip'ed compound document and the compound stream
format, a client side HTTP proxy could download the archive or stream
and expand the content delivering it to the user's existing web browser.

To the web browser the expanded content looks as if it originates from
"http://www.contentprovider.uri/basedirectory/compoundfilename.affix/"
with meta info inside the archive/stream defining the base URI as 
"http://www.contentprovider.uri/basedirectory/"
and another file "index.html" being the base HTML content. The other
content would appear to be relative to the base URI. This means that the
content can still link and applets can interoperate with the website
with the same browser scripting security privileges. 

Using the proxy system also introduces the possibility of also
transparently including Peer to Peer systems to save the content
provider bandwidth costs.

Embedded web pages may contain a URI to the compound document, without
the following slash, 
"http://www.contentprovider.uri/basedirectory/compoundfilename.affix",
so the user may save the compound document to their file system. 

Using unique filename affixes and/or mime declarations, the desktop
operating environment can "associate" the compound document formats with
the client HTTP proxy server.  

Implementing Compound Documents and Streams on the Server Side

As with the client side, any introduction of a compound document format
or compound stream format would require either modification of the
server or the use of a proxy system to gather the separate content and
bundle it together.

For a Jar'ed/Zip'ed compound documents is should be possible for a proxy
system to "request" the content from a conventional web server. The
proxy would then just zip up the resulting directory of files and send
it to the client. Compound streams could be served using the same method
but multi-threaded, delivering the content in real time. 

Document Object Model Access

The proxy system with the recent Load and Save recommendation 
http://www.w3.org/TR/2004/REC-DOM-Level-3-LS-20040407/load-save.html
could be used to access and even change content embedded within 
compound documents and compound stream

-- 
David Mohring <heretic@ihug.co.nz>

Received on Saturday, 10 April 2004 01:49:29 UTC