- From: Daniel Glazman <daniel.glazman@disruptive-innovations.com>
- Date: Sun, 13 Jan 2013 12:46:12 +0100
- To: "www-style@w3.org" <www-style@w3.org>
Hi there,
Our Web world has noiselessly changed some time ago when the Web
standards started leaving the sole domain of the Web to reach new
fields. One that's particularly interesting to me is the eBooks' world.
Working on content editors, I have always wondered why the word
processors we have on the market need to define their own proprietary
formats. Before dropping Nvu and starting BlueGriffon, I pondered
writing not a new content editor for the Web but a new word processor,
strictly HTML+CSS+MathML+SVG-based. I eventually deferred that project
but it still runs in mind from time to time because I know I _can_
do it.
Then I recently started something I absolutely need for my EPUB editor:
an importer for *.docx files. *.docx files are basically a zip
encapsulating xml-based metadata for a word document and an OfficeXML
version of the document (I'm simplifying but I'm sure you get the big
image; if you want to learn more, the spec is 5000+ pages long...).
The docx zip contains xml document instances for the main content,
the multiple headers and footers, the styles, the themes, etc.
Since many, really many book authors' and publishers' work is still
almost entirely based (good or bad) on Word formats, being able to
import/translate a format cleanly exported by Word into an xml
serialization of html5 becomes a key factor for the success of
electronic books. Ebooks have even started leaving the field of books
to reach press, magazines, knowledge management, and more. And
html5+CSS are now massively used for slideshows.
In that perspective, CSS and html5 have a few major holes I would like
to discuss here:
1. the Paged Media and Generated Content for Paged Media specs do not
allow content-rich headers and footers at this time. This most
certainly comes from the fact our browsers print web pages allowing
to specify through UI the contents of 6 areas above and below the
page content to put the URL, the title, the page number, the date,
etc. But no real rich content.
But if you take an average document coming from a word processor,
for instance a contract proposal or an invoice sent by my company,
the headers and footers are MUCH more complex than that; they
contain real content, complex, that convey much more information than
just pagination or document-wide metadata. Our current CSS specs
erroneously consider headers and footers of a page are only Generated
Content from the document, that they then belong to presentation and
not content. They do not, they are really content, should live in the
markup side of the document and all our history in word processing
easily demonstrates it.
With respect to that requirement, the generated content portions of
the Paged Media specs seems to me completely outdated.
2. the Paged Media spec divides the margin of a page into 16 (!) areas
where CSS can generate content from the markup through CSS rules
only. At a time we discuss slot generation through dedicated
at-rules and flows of content, at a time we have flexbox to nicely
and precisely place data wherever we want, at a time we have Grid
Layout to divide finely a layout into slots, this seems to me a
suboptimal and too complex solution. It's unmaintainable from an
author's perspective. It's clearly not enough to import a document
coming from a word processor without dropping a lot of data living
in headers and footers.
3. even if we do have header and footer elements in html5, CSS is
currently to weak to allow authors to give them a rich presentation
including in paged or print media, for instance allowing them to
persist across the pages of a given section.
4. the GCPM module allows content "creation" into headers and footers
through the 'content' property. But the functional notation content()
defined in same spec only allows to retrieve the textual contents
of a given element without capturing its richness. This is far from
enough. A much better way of doing would be to define for instance a
page header as the the 'flow-from' destination of elements carrying a
'flow-into' property. We could also have a specific very simple
property declaring the flow should persist from one page to the next
ones unless that extra page sends itself elements into the
header's flow.
5. footnotes in the GCPM seem to me a tortured solution. I agree
footnotes are an extremely complex problem. But wait, adding a
footnote counter is easy and we have both ::before and ::after.
If we flow footnotes into a footnote area defined as above in 4,
we could use ::before as the counter reference that will stay with
the footnote's prose and ::after as the footnote's source that
remains in the main prose. All we need for it is a way of specifying
generated content does not flow with its parent element. And if
your footnote is a link AND the target of that link, clicking on the
linkified ::after will even take you to the footnote in the
footnotes' area by pure magic. We don't need all the extra stuff the
GCPM spec specifies.
6. similarly, bookmarks are defined by GCPM as being presentational. I
disagree with that approach. A bookmark is clearly for me an
annotation and annotations are content. We could use for bookmarks a
mechanism totally similar to the one I outlined for footnotes above,
with a slot and a flow. Simple, efficient, rich, clean.
7. the main content area of a page can easily be defined as the
substraction of all the flow areas defining headers, footers,
footnotes, etc from the page area (Cf. terminology in section
3.1 of the Paged Media spec). It means that headers, footers,
footnotes and friends can easily be defined as Exclusions to the
page area. Of course, it is still also possible to define a flow
for the main content of a document and send that flow to specified
slots/areas/grid cells/... in pages.
In summary, Paged Media and Generated Content for Paged Media paved the
way. But they were never really implemented if you except YesLogic's
PrinceXML and now show their limits given the new industries using
extensively html and CSS. I am calling for a massive revamp of these
documents based on Regions, Flows, Slots, Exclusions, Grids and the
@page rule. We're just too far behind what live paged media really need
from CSS.
</Daniel>
Received on Sunday, 13 January 2013 11:46:37 UTC