Re: calling for xproc pain points, requested features, etc from Imsieke, Gerrit, le-tex on 2012-01-06 (xproc-dev@w3.org from January 2012)

From: Imsieke, Gerrit, le-tex <gerrit.imsieke@le-tex.de>
Date: Fri, 06 Jan 2012 11:09:56 +0100
To: Michael Sokolov <sokolov@ifactory.com>
CC: xproc-dev@w3.org
Message-ID: <4F06C874.9030904@le-tex.de>

On 2012-01-06 05:18, Michael Sokolov wrote:
> On 1/5/2012 9:46 PM, Imsieke, Gerrit, le-tex wrote:
>>
>> Consider the case of an EPUB that contains XHTML files that link to
>> CSS stylesheets. We have implemented a CSS parser in XSLT2. In order
>> to make it work without too many annoying workarounds, we have to
>> unpack the zip file first.
> We found it convenient to write a URIResolver that pulls content out of
> zip files; that way we can process all the epub pieces in place without
> the need for temporary files. It didn't present great difficulties,
> although I may be missing something - we handled the spine and the rest,
> but don't have a CSS parser - that sounds neat! But some ability to
> enumerate the contents of the zip, (or to pull out the contents of a
> "folder" from the zip - a kind of artificial construct) is a big help.

Sounds really neat. Have you published the resolver somewhere? For our 
purposes, it would have to co-operate with a catalog resolver.

FYI, the CSS parser is here:
https://github.com/gimsieke/epubcheck-xproc/blob/master/xproc/css.xpl
and of course in the XSLT files used therein, namely
https://github.com/gimsieke/epubcheck-xproc/blob/master/xsl/css-parser.xsl
https://github.com/gimsieke/epubcheck-xproc/blob/master/xsl/css2xsl.xsl
https://github.com/gimsieke/epubcheck-xproc/blob/master/xsl/css-util.xsl

It is a threee-step process:
1. For a given XHTML document, extract CSS info from linked stylesheets, 
inline style elements, and style attributes. Memorize priorities for 
every rule.
2. Generate an XSLT stylesheet out of this CSS info. The CSS priorities 
translate to template priorities.
3. Apply the generated XSLT stylesheet to the original XHTML file in 
order to make every CSS property on each element explicit as an @css:* 
attribute.

That is, all CSS style and class attributes will be expanded to XML 
attributes in the http://www.w3.org/1996/css namespace, e.g. <p 
css:margin-top="2em" css:color="red" css:font-family="Arial, sans-serif" 
class="foo"> so that you can use them for grouping, in Schematron rules, …

Although the epubcheck-xproc project is still quite immature, the CSS 
parser and the mechanism to patch Schematron findings back into the HTML 
files (step epub:schematron-spinehtml in the 
https://github.com/gimsieke/epubcheck-xproc/blob/master/xproc/epub.xpl 
library) is already quite usable.

Gerrit

Received on Friday, 6 January 2012 10:10:50 UTC