W3C home > Mailing lists > Public > public-xml-processing-model-wg@w3.org > October 2006

Re: Viewport and nested selected nodes

From: Jeni Tennison <jeni@jenitennison.com>
Date: Fri, 27 Oct 2006 10:20:48 +0100
Message-ID: <4541CF70.9010102@jenitennison.com>
To: public-xml-processing-model-wg@w3.org

Norman Walsh wrote:
> / Jeni Tennison <jeni@jenitennison.com> was heard to say:
> | The 'select' for computed ports (and the input for for-each *is* a computed
> | port) must identify all nodes in the document, not just the outermost ones. My
> | use case is creating multi-page HTML output in which every section (including
> | sections-within-sections) has a separate page. I want to do this with:
> |
> |   <p:for-each>
> |     <p:input port="section" ... select="//section" />
> |     <p:output port="result" step="toHTML" source="result" />
> |     <p:step name="toHTML" type="p:xslt">
> |       <p:input port="document" source="section" />
> |       <p:input port="stylesheet" href="section2html.xsl" />
> |     </p:step>
> |   </p:for-each>
> 
> If you expect not only the outer sections but also the inner sections
> to be selected, how do you imagine that this works? 

I imagine a stylesheet like:

<xsl:stylesheet version="1.0"
                 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="/section">
   <html>
     <head>
       <title><xsl:value-of select="title" /></title>
     </head>
     <body>
       <xsl:apply-templates />
     </body>
   </html>
</xsl:template>

...

<xsl:template match="section">
   <p>
     <a href="{@id}.html"><xsl:value-of select="title" /></a>
   </p>
</xsl:template>

</xsl:stylesheet>

Every section gets its own HTML page. When you process a given section, 
any nested sections are turned into links to their HTML page. This is an 
extremely common way of presenting large technical documents.

> Consider:
> 
> <section id="s1">
>   <p/>
> </section>
> <section id="s2">
>   <p/>
>   <section id="s2.1">
>     <p/>
>   </section>
> </section>
> 
> In the first iteration, the XSLT process gets
> 
>   <section id="s1">
>     <p/>
>   </section>
> 
> as input. In the second iteratio, what does it get?
> 
> <section id="s2">
>   <p/>
>   <section id="s2.1">
>     <p/>
>   </section>
> </section>
> 
> or
> 
> <section id="s2">
>   <p/>
> </section>
> 
> If it gets the former, then in the third iteration, it's going to
> process the s2.1 section *again* which isn't likely to be what users
> expect. It sure isn't what I'd expect.

It gets the former. If you only want to process the top-level sections, 
then you only select those ones (use "/article/body/section" or 
something). In my use case above, I want every section in the document 
to be turned into a separate HTML document.

Of course there are other ways to do this, such as writing a stylesheet 
that creates all the HTML documents concatenated into one huge document 
and then splitting it up. However, as I understand it, XSLT is much more 
efficient when it processes and creates small documents and the latter 
approach can't be optimised via streaming or parallel processing. If I'm 
bothering to use XProc rather than XSLT 2.0, I want some performance 
benefits.

> I expect for-each and viewport (and actually, maybe all select
> expressions in XProc) to select only the highest level matching
> subtrees.

Viewport should certainly select only the highest level matching (or 
selected) subtrees. I can't see how it would work any other way.

I think a similar restriction on the 'select' attribute on <p:input> 
(and the "p:select" step type) is unwarranted.

> If this means the attribute has to be renamed, I suppose I can live
> with that, though I'm not personally disturbed by the additional
> semantic.

The only renaming I want is (a) from 'select' to 'match' if we go with 
XSLT patterns rather than XPath expressions for identifying nodes and 
(b) to distinguish between the 'select' attribute as a shorthand for 
'p:select' and the attribute on <p:viewport> that identifies the 
subtrees that are to be processed.

Cheers,

Jeni
-- 
Jeni Tennison
http://www.jenitennison.com
Received on Friday, 27 October 2006 09:21:18 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:21:49 GMT