A Global Web Publishing Framework

Hello,

I firstly want to apologize for the intrusion and for bringing this to
your attention. BTW, I do hope you'll find this note valuable enough to
beg my pardon.

Sincerely,

Stefano Mazzocchi <stefano@apache.org>
Java Apache Project coordinator

--------------------------------O-------------------------------------

                A Global Web Publishing Framework
                ---------------------------------

                               by
                        Stefano Mazzocchi
                      <stefano@apache.org>

Introduction
------------

Since the release of the second XSL working draft, I've been thinking
about a way to merge such interesting technologies into current web
publishing frameworks based mainly on HTML. The Cocoon Project
(http://java.apache.org/cocoon/) was created with the ambitious goal to
change the way web content is created, distributed and, last but not
least, maintained. The XSL ability to separate content and style on
different files as well as their required knowledge on different people
(or working groups) pushed the project to be _very_ successful and to
reach wide audience in a few weeks.

It was clearly recognized the need for a way to integrate server side
technologies such as dynamic content generation with the XSL framework.
Following the same model, the Cocoon Project is trying to define an
"eXtensible Logicsheet Language" (XLogic) that would integrate server
side dynamism and batch capabilities to the tree construction part of
XSL, either extending the XSL specification or cloning it.

This note is mainly written to express my personal feelings about the
evolution of a "global web publishing framework" that should incorporate
not only language guidelines, but also software architectures and
implementation suggestions. I do believe that W3C is doing an
_outstanding_ job in evolving the web into a truly knowledgeable
distributed information system, but I also believe that implementation
guidelines on both client and server sides are mostly lacking (DOM is
the first spec in this direction).

In this paper, I outline my visions for a global web publishing
framework and I integrate my knowledge on server side dynamic content
generation as well as my experience in real-life XSL deployment. Being
Cocoon's author and Cocoon project coordinator, I hope my
implementations stand as a proof of concept for this note.


The actual XSL model
--------------------

To follow the CSS model and to be able to create the XML equivalent of
stylesheets, the XSL specification is actually the repository for three
different technologies:

 - tree contruction 
 - patterns
 - formatting objects

These three technologies would be best merged into a single
specification if (and only if) their use is restricted only to the
operations described by the XSL goals. In this note, I will underline
how these technologies are well suited for other uses that are not
covered by the XSL picture.


Overlapping goals
-----------------

The first major overlapping region is the over covered by both the "XSL
patterns" and the yet-to-be-defined "XML query language". The XSL WG
already stated how XSL should continue to rule on patterns over any XQL
specification. This friction proves that patterns should be defined
indipendently, expecially because they would be a very valuable resource
for both XML programmatic handling by DOM processors, as data query
language and other usages.

Formatting objects are a specific namespace included in the spec and it
does not have any particual need to be hosted by the same specification
as the tree construction part. Expecially when very few are supporting
the FO model in XSL processors and, when it's done, the support it's
very limited. FO appear as a "plus" for XSL processors. This poses a big
risk of platform fragmentation when the FO part is not used because not
every processor implements it.


A different model
-----------------

I think the solution for the yet-to-become-evident problems of the XSL
specification would be to separate the three parts in different
specifications, while loosing the stylesheet model which is, in XML,
very misleading.

This is the picture I propose:

a) The tree construction part is separated into an eXtensible
Tranformation Language (XTL) which is able to transform any well-formed
XML document into a valid SGML document. In the case the created SGML
document is still well-formed XML, other transformation iterations can
be applied.

b) XQL + XPoint + XLink are used for any internal reference to the XML
document both by the transformation processor or any specific XML
processors written following the Cocoon/OpenXML model. 

c) The Formatting Object Language is used as a page formattation
language and its defined in its own specification clearly separated from
the transformation part.

d) A Postscript-in-XML (XPS) language is defined. XTL processors would
be able to "process" FO documents into XPS documents which may be
directy feed into printers or browsers. This language will be aimed to
be the common language of 2D renderers.


Benefits of the new model
-------------------------

There are many different benefits from the my proposed
"transformation-based" approach:

1) Knowledge regions are better separated (better learning curve).

2) XTL creates the glue between XML (both standardized namespaces or
personally defined DTDs) and any SGML file, focusing on the ability to
"transform" one representation into another adding full programmatic
capabilities. This is obviously modeled after the XSL tree construction
model.

3) the XTL model adds the ability to include dynamic parameters to the
transformation. This would allow user input (either at batch time or on
web-request time) to influence the tranformation process. For example, a
<user-counter/> tag transformation is influenced by the number of times
the specific user has requested the page while the <general-counter/>
tag by the number of total requests. Batch processing, for example,
would allow the HTML rendering of a complete web site starting from the
root page and following links.

4) The creation of a complete 2D description language using the XML
syntax would allow the creation of single browsers to be able to
represent _any_ XML DTD, given the right XML-to-XPS transformation
sheet. For example, a FO-to-XPS XTL file should be included in the FO
specification and define how the FO is mapped into the more general XPS.
The human readability of XPS should be of minimal importance since XPS
files will be almost totally machine generated thru the transformation
steps.


A browser for everything
------------------------

In such a picture, the ability to browse a world of hyperlinked,
distributed, indipendend documents and document definitions is achieved
with specific software tools and semantic "glue" between the different
information domains.

For example, today, the ability to understand and correctly render the
MathML
language is given by the direct implementation of these rendering
capabilities. In this particular case, XML is no different from an
extended HTML.

Using the tranformation model, the MathML language is defined with a
default MathML-to-XPS language. An XPS-aware browser would be able to
apply the tranformation file before passing it to the XPS rendering
engine, or either download the "precompiled to XPS" MathML tags. This
model clearly extends the XSL TC-FO pair to a wider level of
applicability.

In creating a new document definition, one is able to "connect" to the
"web of knowledge" by simply creating the XTL file that links its
specific DTD/namespace to an available one. For example, my own web DTD
would transform into a FO namespaced document (the equivalent of current
XSL operation), which is then "compiled" into XPS and sent to the
browser. If some user parameters are needed during the tranformations,
the processing is done at request-time either on the server side, on the
client side or mixed (depending on the software available). If no or
only static parameters are needed (static pages), the processing is done
at a batch level and the page is compiled into it's most useful
representation (Note: since XPS compilation may generate big files, this
process would more likely be done on the client side).


Conclusions
-----------

I strongly believe XSL to be a very important step for the creation of a
usable "web of knowledge", but I'm also worried about the possibility
that such language does not meet the requirements of a global web
publishing framework and poses limitations (expecially on server side
extensibility) that would be rather hard to overcome.

Even if I do understand how much effort has been put into the XSL model,
in this paper I outlined a possible web publishing framework that would
solve many of the things that the current model is failing to support.
It
also shows how the stylesheet model, probabably too much influenced by
the CSS experience, may lead to specification misunderstandings and
friction between different aspects of web technologies in general.

As a final remark, I must specify that this note reflect my own personal
and humble opinions and would like to be a starting point for the
creation of an active collaboration between the Apache Project in
general and the Web Consortium that would allow standards to be defined
and open implementations to support them.


Copyright (c) 1999 by Stefano Mazzocchi (stefano@apache.org). All rights
reserved.

-- 
Stefano Mazzocchi       A language that doesn't affect the way you 
                      think about programming, is not worth knowing.
<stefano@apache.org>                             Alan J. Perlis
---------------------------------------------------------------------

Received on Tuesday, 6 April 1999 06:28:33 UTC