W3C home > Mailing lists > Public > xproc-dev@w3.org > January 2014

Re: p:store serialization "character-map like"

From: RICAUD-DUSSARGET Matthieu <matthieu.ricaud@igs-cp.fr>
Date: Tue, 21 Jan 2014 15:27:23 +0100
Message-ID: <CADRkOwHyk9r7uWH=BokX2uArpaT175pDwWtFA_iUk9uAb8XQ5g@mail.gmail.com>
To: XProc Dev <xproc-dev@w3.org>
Thank you Florent, this is what I thought...

But well this is not a big problem, assuming that playing with :
<xsl:output-character character="&#8201;" string="&amp;#160;"/> is not a
good practice, whereas
<xsl:output-character character="&#8201;" string="&#160;"/> is a good one,

I have made a custom xslt step that store the xsl:result-document with the
good serialisation inluding character-maps replacement.

In case someone would be interested, see it at the end of this mail.
Maybe the xproc V.next will make this step more simple ?

Any comments or suggestion to make this step better is of course welcome.

Best regards,

Matthieu

-----------------------------------------------------------------
igs:xslt can be used like this :

<igs:xslt debugFileName="step3_idx.xml">
  <p:input port="stylesheet"><p:document
href="my_xslt_with_resultdoc.xsl"/></p:input>
</igs:xslt>

In any case the xslt has to output a well formed XML file (as primary
output), so if you only use xsl:result-document, please add a dummy output.

<p:declare-step type="igs:xslt" name="current">
<!--input/output ports-->
<p:input port="source" sequence="true" primary="true"/>
 <p:input port="parameters" kind="parameter" primary="true"/>
<p:input port="stylesheet"/>
 <p:output port="result" primary="true"/>
<!--options-->
<p:option name="debugFileName" select="''"/>
 <!--variables-->
<p:variable name="xml.base-uri" select="base-uri()"/>
 <p:variable name="xsl.base-uri" select="base-uri()"><p:pipe
port="stylesheet" step="current"></p:pipe></p:variable>
 <p:variable name="xsl.name"
select="tokenize($xsl.base-uri,'/')[last()]"></p:variable>
<p:variable name="debug.file.name" select="if ($debugFileName='') then
(concat('debug/',$xsl.name,'_OUT.xml')) else
(concat('debug/',$debugFileName))"/>
 <p:variable name="debug.file.uri" select="resolve-uri($debug.file.name
,string($xml.base-uri))"/>
 <p:variable name="href_format_sep" select="'___'"></p:variable>
<!-- 1. GET_SERIALIZE-OPTIONS -->
 <p:xslt name="get_serialize-options">
<p:input port="source"><p:pipe port="stylesheet"
step="current"></p:pipe></p:input>
 <p:input port="stylesheet">
<p:inline>
<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
 xmlns:igs="http://www.igs-cp.fr"
xmlns:xslt="http://www.w3.org/1999/XSL/Transform"
 >
<xsl:import href="http://localhost:777/neodev.trunk/XSL_LIBS/igs/common.xsl
"/>
 <xsl:template match="/">
<igs:serialize>
<xsl:apply-templates mode="make_output_for_xproc_serialization"/>
 </igs:serialize>
</xsl:template>
<xsl:template match="xsl:stylesheet"
mode="make_output_for_xproc_serialization">
 <xsl:variable name="xsl.uri" select="document-uri(/)"/>
<xsl:variable name="xsl.name"
select="tokenize(string(xsl.uri),'/')[last()]"/>
 <igs:xslt name="{$xsl.name}">
<xsl:copy-of select="namespace::node()"/>
 <xsl:if test="count(xslt:character-map|xslt:output)!=0">
<igs:serialize-options>
 <xsl:for-each select="xslt:character-map|xslt:output">
<xsl:copy-of select="."/>
 </xsl:for-each>
</igs:serialize-options>
</xsl:if>
 <xsl:apply-templates select="xsl:import|xsl:include"
mode="make_output_for_xproc_serialization"/>
</igs:xslt>
 </xsl:template>
<xsl:template match="xsl:import|xsl:include"
mode="make_output_for_xproc_serialization">
 <xsl:apply-templates select="document(@href)/xsl:stylesheet"
mode="make_output_for_xproc_serialization"/>
</xsl:template>
 </xsl:stylesheet>
</p:inline>
</p:input>
 </p:xslt>
<igs:logFile>
<p:with-option name="href"
select="resolve-uri('get_serialize-options.xml',string(exf:cwd()))"/>
 </igs:logFile>
<p:sink/>
<!-- 2. ADD_FORMAT_TO_HREF_RESULTDOC -->
 <p:label-elements match="xsl:result-document[@format]" attribute="href"
name="add_format_to_href_resultdoc">
<!--concat(@href,$href_format_sep,@format)-->
 <p:with-option name="label" select="concat('concat(@href,&quot;',
$href_format_sep ,'&quot;,@format)')"></p:with-option>
 <p:input port="source"><p:pipe port="stylesheet"
step="current"></p:pipe></p:input>
</p:label-elements>
 <igs:logFile>
<p:with-option name="href"
select="resolve-uri('ecf2contentdoc_format.xsl',string(exf:cwd()))"/>
 </igs:logFile>
<p:sink/>
<!-- 3. PROCESS TRANSFORMATION -->
 <p:xslt name="transform">
<p:input port="source"><p:pipe port="source"
step="current"></p:pipe></p:input>
 <p:input port="stylesheet"><p:pipe port="result"
step="add_format_to_href_resultdoc"></p:pipe></p:input>
 </p:xslt>
<p:sink/>
<!-- 4. STORE EACH XSL:RESULT-DOC -->
 <p:for-each name="store_resultdoc_with_serialization">
<p:iteration-source><p:pipe step="transform"
port="secondary"/></p:iteration-source>
 <p:variable name="resultdoc.serialized_uri" select="p:base-uri()"/>
<p:variable name="resultdoc.uri"
select="replace($resultdoc.serialized_uri,'___.*$','')"/>
 <p:variable name="resultdoc.format"
select="replace($resultdoc.serialized_uri,'^.*___','')"/>
<!--<p:variable name="resultdoc.uri"
select="replace($resultdoc.serialized_uri,concat($href_format_sep,'.*$'),'')"/>
 <p:variable name="resultdoc.format"
select="replace($resultdoc.serialized_uri,concat('^.*',$href_format_sep),'')"/>-->
 <!--<cx:message><p:with-option name="message" select="concat(
'resultdoc.serialized_uri : ',$resultdoc.serialized_uri,'&#10;',
 'resultdoc.uri : ',$resultdoc.uri,'&#10;',
'resultdoc.format : ',$resultdoc.format,'&#10;'
 )"/>
</cx:message>-->
<p:choose>
 <p:xpath-context><p:pipe port="result"
step="get_serialize-options"/></p:xpath-context>
<p:when test="exists(//xsl:output[@name=$resultdoc.format]) or
exists(//xsl:output[not(@name)])">
 <!--Il existe un output avec le bon format, ou bien un output générique
dans la xsl-->
<!--xsl:output attributes : byte-order-mark, cdata-section-elements,
doctype-public, doctype-system, encoding, escape-uri-attributes,
include-content-type, indent, media-type, method, name, normalization-form,
omit-xml-declaration, standalone, undeclare-prefixes, use-character-maps,
version-->
 <!--p:store options :  les même sauf @name et @use-character-maps-->
<!--cf. http://www.w3.org/TR/xslt20/#serialization pour les valeurs par
défaut : -->
 <p:variable name="serialize.method"
select="((//xsl:output[@name=$resultdoc.format],
//xsl:output[not(@name)])[1]/@method,'xml')[1]"><p:pipe port="result"
step="get_serialize-options"/></p:variable>
 <p:variable name="serialize.encoding"
select="((//xsl:output[@name=$resultdoc.format],
//xsl:output[not(@name)])[1]/@encoding,'UTF-8')[1]"><p:pipe port="result"
step="get_serialize-options"/></p:variable>
 <p:variable name="serialize.byte-order-mark"
select="((//xsl:output[@name=$resultdoc.format],
//xsl:output[not(@name)])[1]/@byte-order-mark,'yes')[1]"><p:pipe
port="result" step="get_serialize-options"/></p:variable>
 <p:variable name="serialize.cdata-section-elements"
select="((//xsl:output[@name=$resultdoc.format],
//xsl:output[not(@name)])[1]/@cdata-section-elements,'')[1]"><p:pipe
port="result" step="get_serialize-options"/></p:variable>
 <p:variable name="serialize.doctype-system"
select="(//xsl:output[@name=$resultdoc.format],
//xsl:output[not(@name)])[1]/@doctype-system"><p:pipe port="result"
step="get_serialize-options"/></p:variable>
 <p:variable name="serialize.doctype-public"
select="(//xsl:output[@name=$resultdoc.format],
//xsl:output[not(@name)])[1]/@doctype-public"><p:pipe port="result"
step="get_serialize-options"/></p:variable>
 <p:variable name="serialize.escape-uri-attributes"
select="((//xsl:output[@name=$resultdoc.format],
//xsl:output[not(@escape-uri-attributes)])[1]/@method,'yes')[1]"><p:pipe
port="result" step="get_serialize-options"/></p:variable>
 <p:variable name="serialize.include-content-type"
select="((//xsl:output[@name=$resultdoc.format],
//xsl:output[not(@name)])[1]/@include-content-type,'yes')[1]"><p:pipe
port="result" step="get_serialize-options"/></p:variable>
 <p:variable name="serialize.indent"
select="((//xsl:output[@name=$resultdoc.format],
//xsl:output[not(@name)])[1]/@indent,if ($serialize.method='xhtml' or
$serialize.method='html') then ('yes') else ('no'))[1]"><p:pipe
port="result" step="get_serialize-options"/></p:variable>
 <p:variable name="serialize.media-type"
select="((//xsl:output[@name=$resultdoc.format],
//xsl:output[not(@name)])[1]/@media-type, if ($serialize.method='xml') then
('text/xml') else ( if ($serialize.method='xhtml' or
$serialize.method='html') then ('text/html') else ( if
($serialize.method='text') then ('text/plain') else ('')) ) )[1]"><p:pipe
port="result" step="get_serialize-options"/></p:variable>
 <p:variable name="serialize.normalization-form"
select="((//xsl:output[@name=$resultdoc.format],
//xsl:output[not(@name)])[1]/@normalization-form,'none')[1]"><p:pipe
port="result" step="get_serialize-options"/></p:variable>
 <p:variable name="serialize.omit-xml-declaration"
select="((//xsl:output[@name=$resultdoc.format],
//xsl:output[not(@name)])[1]/@omit-xml-declaration,'no')[1]"><p:pipe
port="result" step="get_serialize-options"/></p:variable>
 <p:variable name="serialize.standalone"
select="((//xsl:output[@name=$resultdoc.format],
//xsl:output[not(@name)])[1]/@standalone,'omit')[1]"><p:pipe port="result"
step="get_serialize-options"/></p:variable>
 <p:variable name="serialize.undeclare-prefixes"
select="((//xsl:output[@name=$resultdoc.format],
//xsl:output[not(@name)])[1]/@undeclare-prefixes,'')[1]"><p:pipe
port="result" step="get_serialize-options"/></p:variable>
 <p:variable name="serialize.version"
select="((//xsl:output[@name=$resultdoc.format],
//xsl:output[not(@name)])[1]/@version,'1.0')[1]"><p:pipe port="result"
step="get_serialize-options"/></p:variable>
 <p:variable name="serialize.use-character-maps"
select="(//xsl:output[@name=$resultdoc.format],
//xsl:output[not(@name)])[1]/@use-character-maps"><p:pipe port="result"
step="get_serialize-options"/></p:variable>
 <!--<cx:message>
<p:with-option name="message" select=" concat(
 'Il existe un output avec le bon format, ou bien un output générique dans
la xsl', '&#10;',
'$serialize.method=',$serialize.method, '&#10;',
 '$serialize.encoding=',$serialize.encoding, '&#10;',
'$serialize.byte-order-mark=',$serialize.byte-order-mark, '&#10;',
 '$serialize.cdata-section-elements=',$serialize.cdata-section-elements,
'&#10;',
'$serialize.doctype-system=',$serialize.doctype-system, '&#10;',
 '$serialize.doctype-public=',$serialize.doctype-public, '&#10;',
'$serialize.escape-uri-attributes=',$serialize.escape-uri-attributes,
'&#10;',
 '$serialize.include-content-type=',$serialize.include-content-type,
'&#10;',
'$serialize.indent=',$serialize.indent, '&#10;',
 '$serialize.media-type=',$serialize.media-type, '&#10;',
'$serialize.normalization-form=',$serialize.normalization-form, '&#10;',
 '$serialize.omit-xml-declaration=',$serialize.omit-xml-declaration,
'&#10;',
'$serialize.standalone=',$serialize.standalone, '&#10;',
 '$serialize.undeclare-prefixes=',$serialize.undeclare-prefixes, '&#10;',
'$serialize.version=',$serialize.version, '&#10;'
 )"/>
</cx:message>-->
<!--On insert les serialize-option dans le xml (il sera supprimé par
character-maps_with_replace.xsl) -->
 <p:insert match="/*" position="first-child">
<p:input port="insertion">
 <p:pipe port="result" step="get_serialize-options"/>
</p:input>
 </p:insert>
<p:choose>
<p:when test="$serialize.use-character-maps!=''">
 <p:xslt name="make_character-maps_xslt">
<p:input port="stylesheet">
 <p:inline>
<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
 xmlns:igs="http://www.igs-cp.fr"
xmlns:h="http://www.w3.org/1999/xhtml"
 xmlns:saxon="http://saxon.sf.net/"
exclude-result-prefixes="#all"
 >
    <xsl:param name="use-character-maps" required="yes"/>
    <xsl:variable name="character-map"
select="//xsl:character-map[@name=$use-character-maps]"
as="element(xsl:character-map)?"/>
    <xsl:template match="/">
     <xsl:choose>
 <xsl:when test="count($character-map)!=1">
<xsl:message terminate="yes">[ERROR] <xsl:value-of
select="count($character-map)"/> character-maps found instead of exactly
one (use-character-maps="<xsl:value-of
select="$use-character-maps"/>").</xsl:message>
 </xsl:when>
<xsl:otherwise>
<xsl:next-match/>
 </xsl:otherwise>
</xsl:choose>
</xsl:template>
 <!-- prevent warning : saxon needs at least one match on the source file
root namespace -->
<xsl:template match="h:html">
 <xsl:next-match/>
</xsl:template>
    <!--default copy-->
 <xsl:template match="* | @* | processing-instruction() | comment()">
<xsl:copy copy-namespaces="no">
 <xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
 <!-- delete igs:serialize-options which was add on last step so we can
access it within the same file -->
<xsl:template match="igs:serialize"/>
 <!--text replace according to the matching character-map-->
<xsl:template match="text()">
 <xsl:call-template name="replace_character">
<xsl:with-param name="text" select="." as="xs:string"/>
 <xsl:with-param name="character-map" select="$character-map"
as="element(xsl:character-map)"/>
</xsl:call-template>
 </xsl:template>
<xsl:template name="replace_character">
<xsl:param name="text" as="xs:string"/>
 <xsl:param name="character-map" as="element(xsl:character-map)"/>
<xsl:param name="pos" select="1" as="xs:integer"/>
 <xsl:choose>
<xsl:when test="$pos le count($character-map/xsl:output-character)">
 <xsl:variable name="output-character"
select="$character-map/xsl:output-character[$pos]"
as="element(xsl:output-character)"/>
 <xsl:variable name="character" select="$output-character/@character"/>
<xsl:variable name="string" select="$output-character/@string"/>
 <xsl:call-template name="replace_character">
<xsl:with-param name="text" as="xs:string">
 <xsl:choose>
<xsl:when test="matches($string,'&amp;(.*?);')">
 <xsl:message terminate="yes">[ERROR][xsl:character-maps
name="<xsl:value-of select="$character-map/@name"/>", xsl:output-character
character="<xsl:value-of select="$character"/>" string="<xsl:value-of
select="$string"/>"] the string attributes tries to make a new entity which
has to be escaped : this will not be possible, sorry !</xsl:message>
 </xsl:when>
<xsl:otherwise><xsl:value-of
select="replace($text,$character,$string)"/></xsl:otherwise>
 </xsl:choose>
</xsl:with-param>
<xsl:with-param name="character-map" select="$character-map"
as="element(xsl:character-map)"/>
 <xsl:with-param name="pos" select="$pos + 1" as="xs:integer"/>
</xsl:call-template>
 </xsl:when>
<xsl:otherwise>
<xsl:value-of select="$text"/>
 </xsl:otherwise>
</xsl:choose>
</xsl:template>
 </xsl:stylesheet>
</p:inline>
</p:input>
 <p:with-param name="use-character-maps"
select="$serialize.use-character-maps"/>
</p:xslt>
 </p:when>
<p:otherwise><p:identity/></p:otherwise>
</p:choose>
 <p:store>
<p:with-option name="method" select="$serialize.method"/>
 <p:with-option name="encoding" select="$serialize.encoding"/>
<p:with-option name="byte-order-mark" select="if
($serialize.byte-order-mark='yes') then (1) else (0)"/>
 <p:with-option name="cdata-section-elements"
select="$serialize.cdata-section-elements"/>
<p:with-option name="doctype-system" select="$serialize.doctype-system"/>
 <p:with-option name="doctype-public" select="$serialize.doctype-public"/>
<p:with-option name="escape-uri-attributes" select="if
($serialize.escape-uri-attributes='yes') then (1) else (0)"/>
 <p:with-option name="include-content-type" select="if
($serialize.include-content-type='yes') then (1) else (0)"/>
<p:with-option name="indent" select="if ($serialize.indent='yes') then (1)
else (0)"/>
 <p:with-option name="media-type" select="$serialize.media-type"/>
<p:with-option name="normalization-form"
select="$serialize.normalization-form"/>
 <p:with-option name="omit-xml-declaration" select="if
($serialize.omit-xml-declaration='yes') then (1) else (0)"/>
<p:with-option name="standalone" select="$serialize.standalone"/>
 <p:with-option name="undeclare-prefixes" select="if
($serialize.undeclare-prefixes='yes') then (1) else (0)"/>
<p:with-option name="version" select="$serialize.version"/>
 <p:with-option name="href" select="$resultdoc.uri"/>
</p:store>
 </p:when>
<p:otherwise>
<!-- no "xsl:output" has been found so we store the file with the default
xproc processor serialisation -->
 <p:store>
<p:with-option name="href" select="$resultdoc.uri"/>
 </p:store>
</p:otherwise>
</p:choose>
 </p:for-each>
<!-- 5. OUTPUT -->
<p:identity><p:input port="source"><p:pipe port="result"
step="transform"></p:pipe></p:input></p:identity>
 <igs:logFile>
<p:with-option name="href" select="$debug.file.uri"/>
 </igs:logFile>
</p:declare-step>


<igs:logFile> helps for debugging, it has been inspirated by Geert's ut:log
step in xproc-ebook-conv (see https://github.com/grtjn/xproc-ebook-conv),
thanks to him !

<p:declare-step type="igs:logFile" name="current">
 <p:input port="source" sequence="true"/>
<p:output port="result" sequence="true"><p:pipe port="source"
step="current"/></p:output>
 <p:input port="parameters" kind="parameter"/>
<p:option name="href" required="true" cx:type="xsd:anyURI"/>
 <p:option name="method" select="'xml'"/>
<p:option name="indent" select="'true'"/>
 <p:wrap-sequence wrapper="sequence"/>
<igs:parameters2xml name="params"/>
 <p:group>
<p:variable name="debug" select="(//c:param[@name='debug']/@value,
false())[1]">
 <p:pipe port="parameters" step="params"/>
</p:variable>
 <p:choose>
<p:when test="string($debug) eq 'true'">
 <cx:message><p:with-option name="message" select="concat('Log fichier: ',
$href)"></p:with-option></cx:message>
 <p:store>
<p:with-option name="href" select="$href"/>
 <p:with-option name="method" select="$method"/>
<p:with-option name="indent" select="$indent"/>
 </p:store>
</p:when>
<p:otherwise>
 <!-- La sortie est déjà générée: c'est la même que l'entrée -->
<p:sink/>
 </p:otherwise>
</p:choose>
 </p:group>
</p:declare-step>
 <p:declare-step type="igs:parameters2xml" name="current">
<p:input port="source" sequence="true" primary="true"/>
 <p:input port="in-parameters" kind="parameter" sequence="true"
primary="true"/>
 <p:output port="result" sequence="true" primary="true">
<!-- pipe input straight through to output -->
 <p:pipe step="current" port="source"/>
</p:output>
 <!-- extra output port for cleaned params -->
<p:output port="parameters" sequence="false" primary="false">
 <p:pipe step="params" port="result"/>
</p:output>
 <p:parameters name="params">
<p:input port="parameters">
 <p:pipe step="current" port="in-parameters"/>
</p:input>
 </p:parameters>
</p:declare-step>

2014/1/20 Florent Georges <fgeorges@fgeorges.org>

> On 20 January 2014 10:10, RICAUD-DUSSARGET Matthieu wrote:
>
> > It seems  disable-output-escaping="yes" doesn't work when running
> > an <p:xslt>
>
>   Note that D-O-E is taken into account only when the XSLT processor
> is taking care of the serialization, which I believe is not the case
> with p:xslt in Calabash.
>
>   Regards,
>
> --
> Florent Georges
> http://fgeorges.org/
> http://h2oconsulting.be/
>



-- 
Matthieu Ricaud-Dussarget
IGS-CP - Développeur XML
05 45 37 09 49
Received on Tuesday, 21 January 2014 14:28:24 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:03:11 UTC