ACTION-1943 -- Erik Bruchez to investigate why XPath 3 chose the function signature they did for serialization

All,

I had a slightly deeper look at the "XSLT and XQuery Serialization 3.0
- Serialization Parameters" section [1].

Based purely on the spec, it seems that the main reason that a tree of
elements is used is the `use-character-maps` serialization parameter,
which requires a list of pairs, as in the following example:

<output:serialization-parameters
       xmlns:output="http://www.w3.org/2010/xslt-xquery-serialization"
       xmlns:ext="http://example.org/ext">
  <output:method value="ext:jsp"/>
  <output:use-character-maps>
    <output:character-map character="&#xAB;" map-string="&lt;%"/>
    <output:character-map character="&#xBB;" map-string="%&gt;"/>
  </output:use-character-maps>
</output:serialization-parameters>

Each `character-map` element maps from a single character to a string
of characters.

Character maps are an XSLT 2 feature, which is now making it more
generally to the separate serialization spec. Character maps are
discussed in more details here [2] (XSLT 2) and here [3] (XSLT 3).

Now this is a bit of a drag because if we want to support this, then
simple name/attribute values don't work anymore: one of the values
needs to be itself a map of character -> string.

I see a few possibilities:

1. Just follow the XSLT/XQuery 3 format

- Benefits: less spec work for us, leveraging existing knowledge
(although almost nobody yet knows anything about XSLT 3 I suppose).
- Drawback: the format is heavy and hideous.

2. Define our own format

We could have a serialization attribute called `character-map`, with a
space-separated list of character/string pairs in even number:

  character-map="&#xAB; &lt;% &#xBB; %&gt;"

(Note that, in attributes, whitespace character other than space
(#x20) must be encoded as character references or they will be
normalized by the parser.)

Another possible format uses a JSON object as attribute value. In our
product, we have a configuration property (in a context different from
serialization but very similar) which uses JSON maps as follows:

    <property as="xs:string"
              name="oxf.xforms.filter.input"
              value='{ "&#x2018;": "&#039;",
                       "&#x2019;": "&#039;",
                       "&#x201c;": "\"",
                       "&#x201d;": "\"",
                       "&#x2013;": "&#045;",
                       "&#x2014;": "&#045;",
                       "&#x2219;": "&#045;",
                       "&#x2022;": "&#045;",
                       "&#x00BF;": "&#063;",
                       "&#x2026;": "..." }'/>

3. Support both

If the element passed is `output:serialization-parameters`, then we
use the XSLT 3 format. Otherwise, we use name/values attributes.

I don't have a fully defined opinion on this at this point.

Feedback welcome,

-Erik

[1] http://www.w3.org/TR/xslt-xquery-serialization-30/#serparam
[2] http://www.w3.org/TR/xslt20/#character-maps
[3] http://www.w3.org/TR/xslt-30/#character-maps

Received on Tuesday, 7 May 2013 21:49:07 UTC