Serialization Control from Alex Milowski on 2007-07-11 (public-xml-processing-model-wg@w3.org from July 2007)

From: Alex Milowski <alex@milowski.org>
Date: Wed, 11 Jul 2007 13:14:58 -0700
To: public-xml-processing-model-wg <public-xml-processing-model-wg@w3.org>
Message-ID: <28d56ece0707111314w50a4fe14v96ec70cfa3a44d29@mail.gmail.com>
I've considered a number of different ways for an author to declare their
serialization intent within pipelines.  We have this set of options:

   1. version                - The version of XML (i.e. 1.0 or 1.1).
   2. encoding               - The requested unicode encoding (e.g. "UTF-8").
   3. indent                 - A boolean value for whether or not to indent
                               the markup (true = ident).
   4. cdata-section-elements - A list of CDATA section elements.
   5. omit-xml-declaration   - A boolean value for whether to omit the XML PI
                               (true = omit).
   6. standalone             - The value of the 'standalone' pseudo-attribute
                               in the XML PI.
   7. undeclare-prefixes     - A boolean value indicating whether to undeclare
                               unused prefixes if the version of XML is 1.1
                               (true = undeclare)
   8. normalization-form     - The Unicode normalization form to be used.
   9. media-type             - The media type to be associated with the
                               serialization.
   10. use-charatcter-maps   - The character map to use.
   11. byte-order-mark       - A boolean value indicating whether to use the
                               byte order mark (true = BOM).
   12. escape-uri-attributes - A boolean value indicating whether to escape
                               known uri attributes (true = escape).  This is
                               only applicable to the 'html' and 'xhtml'
                               methods.
   13. include-content-type  - A boolean value indicating whether to include
                               the content type in the serialization
                               (applicable to the 'html' and 'xhtml' methods)
                               (true = include).

I believe adding these options or any way to specify them to p:output isn't
a good choice as there are plenty of contexts where they don't apply.

I also can see some benefit in having serialization control for p:log as well.

As such, I propose we add a "p:serialization" element to allow an author
to declare a set of serialization options and give them a name.  For example:

   <p:serialization name="myxml" method="xml"
omit-xml-declaration="yes" indent="yes"/>

All the serialization options listed above plus 'method' would be
attributes on p:serialization.

This could occur only in p:pipeline:

<p:pipeline
  name? = NCName
  type? = QName
  ignore-prefixes? = prefix list>
    (p:input |
     p:output |
     p:option |
     p:import |
     p:declare-step |
     p:log |
     p:serialization )*,
    subpipeline
</p:pipeline>

We then allow p:log and p:output to have an optional 'serialization'
attribute whose name
must match one of the names on the p:serialization elements.

The scope of the p:serialization declaration would be the pipeline document.

If an output is not serialized, the 'serialization' attribute has no
effect.  If the output is
serialized, a processor *may* use the serialization options.  An
implementation would be
free to provide a way to override these serialization
options--including always overriding
them with a fixed set.

-- 
--Alex Milowski
"The excellence of grammar as a guide is proportional to the paucity of the
inflexions, i.e. to the degree of analysis effected by the language
considered."

Bertrand Russell in a footnote of Principles of Mathematics
Received on Wednesday, 11 July 2007 20:15:02 UTC