Serialization Analysis and Proposal from Alex Milowski on 2007-05-15 (public-xml-processing-model-wg@w3.org from May 2007)

From: Alex Milowski <alex@milowski.org>
Date: Tue, 15 May 2007 13:14:54 -0700
To: "XProc WG" <public-xml-processing-model-wg@w3.org>
Message-ID: <28d56ece0705151314j661302ccga2024bc5a16e7045@mail.gmail.com>
The XQuery and XSLT 2.0 serialization specification is:

   http://www.w3.org/TR/xslt-xquery-serialization/

XSLT 1.0 defines its serialization methods here:

   http://www.w3.org/TR/xslt#output

A serialization method has a name is represent by a QName.  If the namespace
name is null, then it must be governed by one of the standard serialization
methods defined by the serialization spec.


Parameters to serialization (see [1]):

   1. version                - The version of XML (i.e. 1.0 or 1.1).
   2. encoding               - The requested unicode encoding (e.g. "UTF-8").
   3. indent                 - A boolean value for whether or not to indent
                               the markup (true = ident).
   4. cdata-section-elements - A list of CDATA section elements.
   5. omit-xml-declaration   - A boolean value for whether to omit the XML PI
                               (true = omit).
   6. standalone             - The value of the 'standalone' pseudo-attribute
                               in the XML PI.
   7. undeclare-prefixes     - A boolean value indicating whether to undeclare
                               unused prefixes if the version of XML is 1.1
                               (true = undeclare)
   8. normalization-form     - The Unicode normalization form to be used.
   9. media-type             - The media type to be associated with the
                               serialization.
   10. use-charatcter-maps   - The character map to use.
   11. byte-order-mark       - A boolean value indicating whether to use the
                               byte order mark (true = BOM).
   12. escape-uri-attributes - A boolean value indicating whether to escape
                               known uri attributes (true = escape).  This is
                               only applicable to the 'html' and 'xhtml'
                               methods.
   13. include-content-type  - A boolean value indicating whether to include
                               the content type in the serialization
                               (applicable to the 'html' and 'xhtml' methods)
                               (true = include).

With the exception of the 'use-character-maps' parameter, most of these can
be specified with a simple type attribute.  The description of character
maps requires an element like XSLT 2.0's xsl:character-map element (see [2]).

We have three places where serialization may need to be controlled or provided
to the process excuting the pipeline:

   1. The output port of a pipeline.

   2. The p:store step.

   3. The entity body of an HTTP request for p:http-request.

It also looks to me like XSLT 1.0's serialization is a subset of XSLT 2.0.  It
is easy to detect parameters that can't be supported by an XSLT 1.0
serialization engine.  In addition, the XSLT 2.0 specification says:

   "An implementation may allow the attributes of the xsl:output declaration
    to be overridden, or the default values to be changed, using the API that
    controls the transformation."

That means an implementation can choose to ignore a serialization parameter
although a user would see that as a non-feature.  I assume that is to
allow serialization parameters to be set by the invocation of the transform
and not always be dictated by the XSLT transformation.

Given all of this, I think we need:

  1. A corresponding element like the xsl:character-map (i.e. p:character-map)
     element that occur as a sibling of p:output for the p:pipeline element.

  2. A new element that called p:serialization that has a 'name' attribute
     and all the serialization parameters as attributes that is allowed
     as a sibling of p:output for the p:pipeline element.  The value of
     the 'use-character-maps' is a list of QName values that correspond
     to the name on the p:character-map element.

  3. For p:store and p:http-request, we allow all the serialization
     parameters as options to the step.  The value for the
     'use-character-maps' is a list of QName values that correspond
     to our 'xsl:character-map' element.

  4. We add a 'serialization' option to the p:store and p:http-request
     that names a p:serialization element to use.  The values of
     the serialization parameters are merged where the locally
     defined options on the step are preferred over those of
     the p:serialization element.

  5. We add a 'serialization' attribute to the p:output element.  If
     the pipeline output has one of these attributes and the output
     is serialized, the processor should apply that serialization.
     These serialization options may be overridden via invocation
     just as for XSLT 2.0.

     If the 'serialization' attribute is specified on step outputs,
     it is ignored.  Note: It could be useful for journaling
     the outputs.

Having a 'p:serialization' declaration element has the advantage of
allowing a pipeline library to declare serialization for a set of
pipelines.  We could decide to provide a way for such declarations
to be made available through the library as well as the pipelines.
On possibility is for the p:pipeline-library to allow the p:serialization
element as well.


[1] http://www.w3.org/TR/xslt-xquery-serialization/#serparam
[2] http://www.w3.org/TR/xslt20/#character-maps


-- 
--Alex Milowski
"The excellence of grammar as a guide is proportional to the paucity of the
inflexions, i.e. to the degree of analysis effected by the language
considered."

Bertrand Russell in a footnote of Principles of Mathematics
Received on Tuesday, 15 May 2007 20:15:13 UTC