Re: Calling for a massive revamp of Paged Media and GCPM

Another aspect of eBooks built with HTML/CSS is that content is usually divided into several HTML files, sometimes for logical reasons (content organisation, e.g. chapters), but also in order to meet functional requirements of portable reading devices (hardware / memory). In the typical reflowable case (i.e. not "fixed layout"), this may introduce dynamic pagination issues, such as when the author's intent is for a single rendered page to seamlessly combine two adjacent DOM instances, somewhat within a consistent CSS space. There are interesting thoughts on the EPUB Working Group forum / mailing-list regarding this topic:

https://groups.google.com/forum/?fromgroups=#!topic/epub-working-group/SCGDUpgFUeI

Regards, Daniel

On 13 Jan 2013, at 11:46, Daniel Glazman wrote:

> 
> Hi there,
> 
> Our Web world has noiselessly changed some time ago when the Web
> standards started leaving the sole domain of the Web to reach new
> fields. One that's particularly interesting to me is the eBooks' world.
> 
> Working on content editors, I have always wondered why the word
> processors we have on the market need to define their own proprietary
> formats. Before dropping Nvu and starting BlueGriffon, I pondered
> writing not a new content editor for the Web but a new word processor,
> strictly HTML+CSS+MathML+SVG-based. I eventually deferred that project
> but it still runs in mind from time to time because I know I _can_
> do it.
> 
> Then I recently started something I absolutely need for my EPUB editor:
> an importer for *.docx files. *.docx files are basically a zip
> encapsulating xml-based metadata for a word document and an OfficeXML
> version of the document (I'm simplifying but I'm sure you get the big
> image; if you want to learn more, the spec is 5000+ pages long...).
> 
> The docx zip contains xml document instances for the main content,
> the multiple headers and footers, the styles, the themes, etc.
> 
> Since many, really many book authors' and publishers' work is still
> almost entirely based (good or bad) on Word formats, being able to
> import/translate a format cleanly exported by Word into an xml
> serialization of html5 becomes a key factor for the success of
> electronic books. Ebooks have even started leaving the field of books
> to reach press, magazines, knowledge management, and more. And
> html5+CSS are now massively used for slideshows.
> 
> In that perspective, CSS and html5 have a few major holes I would like
> to discuss here:
> 
> 1. the Paged Media and Generated Content for Paged Media specs do not
>   allow content-rich headers and footers at this time. This most
>   certainly comes from the fact our browsers print web pages allowing
>   to specify through UI the contents of 6 areas above and below the
>   page content to put the URL, the title, the page number, the date,
>   etc. But no real rich content.
>   But if you take an average document coming from a word processor,
>   for instance a contract proposal or an invoice sent by my company,
>   the headers and footers are MUCH more complex than that; they
>   contain real content, complex, that convey much more information than
>   just pagination or document-wide metadata. Our current CSS specs
>   erroneously consider headers and footers of a page are only Generated
>   Content from the document, that they then belong to presentation and
>   not content. They do not, they are really content, should live in the
>   markup side of the document and all our history in word processing
>   easily demonstrates it.
>   With respect to that requirement, the generated content portions of
>   the Paged Media specs seems to me completely outdated.
> 
> 2. the Paged Media spec divides the margin of a page into 16 (!) areas
>   where CSS can generate content from the markup through CSS rules
>   only. At a time we discuss slot generation through dedicated
>   at-rules and flows of content, at a time we have flexbox to nicely
>   and precisely place data wherever we want, at a time we have Grid
>   Layout to divide finely a layout into slots, this seems to me a
>   suboptimal and too complex solution. It's unmaintainable from an
>   author's perspective. It's clearly not enough to import a document
>   coming from a word processor without dropping a lot of data living
>   in headers and footers.
> 
> 3. even if we do have header and footer elements in html5, CSS is
>   currently to weak to allow authors to give them a rich presentation
>   including in paged or print media, for instance allowing them to
>   persist across the pages of a given section.
> 
> 4. the GCPM module allows content "creation" into headers and footers
>   through the 'content' property. But the functional notation content()
>   defined in same spec only allows to retrieve the textual contents
>   of a given element without capturing its richness. This is far from
>   enough. A much better way of doing would be to define for instance a
>   page header as the the 'flow-from' destination of elements carrying a
>   'flow-into' property. We could also have a specific very simple
>   property declaring the flow should persist from one page to the next
>   ones unless that extra page sends itself elements into the
>   header's flow.
> 
> 5. footnotes in the GCPM seem to me a tortured solution. I agree
>   footnotes are an extremely complex problem. But wait, adding a
>   footnote counter is easy and we have both ::before and ::after.
>   If we flow footnotes into a footnote area defined as above in 4,
>   we could use ::before as the counter reference that will stay with
>   the footnote's prose and ::after as the footnote's source that
>   remains in the main prose. All we need for it is a way of specifying
>   generated content does not flow with its parent element. And if
>   your footnote is a link AND the target of that link, clicking on the
>   linkified ::after will even take you to the footnote in the
>   footnotes' area by pure magic. We don't need all the extra stuff the
>   GCPM spec specifies.
> 
> 6. similarly, bookmarks are defined by GCPM as being presentational. I
>   disagree with that approach. A bookmark is clearly for me an
>   annotation and annotations are content. We could use for bookmarks a
>   mechanism totally similar to the one I outlined for footnotes above,
>   with a slot and a flow. Simple, efficient, rich, clean.
> 
> 7. the main content area of a page can easily be defined as the
>   substraction of all the flow areas defining headers, footers,
>   footnotes, etc from the page area (Cf. terminology in section
>   3.1 of the Paged Media spec). It means that headers, footers,
>   footnotes and friends can easily be defined as Exclusions to the
>   page area. Of course, it is still also possible to define a flow
>   for the main content of a document and send that flow to specified
>   slots/areas/grid cells/... in pages.
> 
> In summary, Paged Media and Generated Content for Paged Media paved the
> way. But they were never really implemented if you except YesLogic's
> PrinceXML and now show their limits given the new industries using
> extensively html and CSS. I am calling for a massive revamp of these
> documents based on Regions, Flows, Slots, Exclusions, Grids and the
> @page rule. We're just too far behind what live paged media really need
> from CSS.
> 
> </Daniel>
> 

Received on Sunday, 13 January 2013 18:11:12 UTC