ACTION-1943 -- Erik Bruchez to investigate why XPath 3 chose the function signature they did for serialization


I had a slightly deeper look at the "XSLT and XQuery Serialization 3.0
- Serialization Parameters" section [1].

Based purely on the spec, it seems that the main reason that a tree of
elements is used is the `use-character-maps` serialization parameter,
which requires a list of pairs, as in the following example:

  <output:method value="ext:jsp"/>
    <output:character-map character="&#xAB;" map-string="&lt;%"/>
    <output:character-map character="&#xBB;" map-string="%&gt;"/>

Each `character-map` element maps from a single character to a string
of characters.

Character maps are an XSLT 2 feature, which is now making it more
generally to the separate serialization spec. Character maps are
discussed in more details here [2] (XSLT 2) and here [3] (XSLT 3).

Now this is a bit of a drag because if we want to support this, then
simple name/attribute values don't work anymore: one of the values
needs to be itself a map of character -> string.

I see a few possibilities:

1. Just follow the XSLT/XQuery 3 format

- Benefits: less spec work for us, leveraging existing knowledge
(although almost nobody yet knows anything about XSLT 3 I suppose).
- Drawback: the format is heavy and hideous.

2. Define our own format

We could have a serialization attribute called `character-map`, with a
space-separated list of character/string pairs in even number:

  character-map="&#xAB; &lt;% &#xBB; %&gt;"

(Note that, in attributes, whitespace character other than space
(#x20) must be encoded as character references or they will be
normalized by the parser.)

Another possible format uses a JSON object as attribute value. In our
product, we have a configuration property (in a context different from
serialization but very similar) which uses JSON maps as follows:

    <property as="xs:string"
              value='{ "&#x2018;": "&#039;",
                       "&#x2019;": "&#039;",
                       "&#x201c;": "\"",
                       "&#x201d;": "\"",
                       "&#x2013;": "&#045;",
                       "&#x2014;": "&#045;",
                       "&#x2219;": "&#045;",
                       "&#x2022;": "&#045;",
                       "&#x00BF;": "&#063;",
                       "&#x2026;": "..." }'/>

3. Support both

If the element passed is `output:serialization-parameters`, then we
use the XSLT 3 format. Otherwise, we use name/values attributes.

I don't have a fully defined opinion on this at this point.

Feedback welcome,



Received on Tuesday, 7 May 2013 21:49:06 UTC