Re: FO->Area as XSLT extension function

On Sun, March 17, 2013 4:46 pm, Arved Sandstrom wrote:
> I'll be interested to see the XSLT experts here (which I suspect is
> everyone except me) develop and refine the usage pattern for the
> extension function. One immediate proof-of-concept (POC) pattern, I
> would guess based on discussion to date, is then to use document() with
> XPath to check one or more settings. Again speculating out loud, we have

Yes, pretty much.  Being able to do a simple 'trial run' of parts of the
output would help with 'Customer Requirements' [1] scenarios #1, #9, and
#10.  For a more liberal interpretation of 'simple', it could be said to
also cover #5, #6, and #7, and for a very elastic definition of 'simple'
could handle #3 and #4.

I don't see it particularly helping with #2 or #8.  For #2, "If a
paragraph starts on top of a page, it must not be indented", you'd have to
lay out the whole document to see which blocks are at the tops of pages,
and you'd have to hope that reducing the 'text-indent' [2] doesn't cause a
block to lose a short last line and so cause other things to shuffle up
and so change what is at the top of subsequent pages.

For #8, "Ability to span tables across columns in a multicolumn page when
the table is embedded in lower level hierarchy", a current XSL-FO
processor wouldn't be able to do that with having implemented it as an
extension, and the computation to rejigger the FO tree using XSLT so the
area tree came out the way you want would be fiendishly complex.

> access to both the FO node tree as an XSLT variable at that point, and
> in theory also the FOP Intermediate Format (IF) node tree if we cared to
> obtain it. So either/or would be modifiable (new variable and all that,
> but that's the idea).
>
> Am I off base here?

You're ahead of me in thinking of the FO that's passed to the extension
function as a variable that could be modified (more accurately, a second
variable could be created that is a modified copy of the first).  I'd so
far only thought of using the returned area tree to work out some
parameter value that would be passed to templates/functions that would
produce the final FO output, but in scenarios that lend themselves to
iterative solutions such as #1, "Decrease font size until text fits in a
given box", you may, in fact, be better off tweaking the same FO fragment
and reusing the last iteration's FO in the final FO output.

> I can easily enough modify the extension function to specify input and
> output modes for FOP runs. Logical outputs are IF (1st run) and PDF (2nd
> run). Inputs can be either FO or IF in our scenario.

Again, you're ahead of me.  What do others think?

There's two things that we could run up against if we head towards
pervasive use of this sort of 'partial XSL-FO processing': the XSLT
processing model and memory usage limits.

Section 18.1.2, "Calling Extension Functions" [3], of the XSLT 2.0 spec
includes:

   There is no prohibition on calling extension functions
   that have side-effects (for example, an extension
   function that writes data to a file). However, the order
   of execution of XSLT instructions is not defined in this
   specification, so the effects of such functions are
   unpredictable.

Ideally the order could be made predictable by passing the function
result, or something derived from the function result, as a parameter to
other templates such that the function has to produce a result before the
dependant templates can run.

If it gets to the point of a stylesheet doing a lot of partial FO
processing, it would help the memory usage if the XSLT processor would
recognise when an area tree or partial FO tree has gone out of scope and
would garbage-collect it when it is no longer needed, particularly if
scenarios such as #1 require a lot of recursion to narrow in on the
correct result.

Regards,


Tony.

[1] http://www.w3.org/community/ppl/wiki/CustomerRequirements
[2] http://www.w3.org/TR/xsl11/#text-indent
[3] http://www.w3.org/TR/xslt20/#calling-extension-functions

Received on Monday, 18 March 2013 13:36:12 UTC