Re: Prioritisation from Johannes Wilm on 2015-08-05 (public-digipub@w3.org from August 2015)

From: Johannes Wilm <johanneswilm@vivliostyle.com>
Date: Thu, 6 Aug 2015 00:34:58 +0200
To: Kaveh Bazargan <kaveh@rivervalleytechnologies.com>
Cc: Dave Cramer <dauwhe@gmail.com>, Richard Ishida <ishida@w3.org>, W3C Digital Publishing Discussion list <public-digipub@w3.org>
Message-ID: <CABkgm-QxVtgZY=Bc9hvKFeHHB66VogLaq1NXHCwBKDuChXS5QQ@mail.gmail.com>
Kaveh's email just reach me now, so I have only seen other parts of the
discussion so far.

On Tue, Aug 4, 2015 at 5:55 PM, Kaveh Bazargan <
kaveh@rivervalleytechnologies.com> wrote:

> Forgive me for a very basic question, but it is a devil's advocate type of
> question. And if this is not the place to ask this perhaps you can direct
> me to any relevant discussions.
>
> My very basic question is, why do we need to "paginate" in the browser in
> the first place? Why not keep the browser for reflowing and interactive
> text, which is what it is good at, and use a standard mark-up pagination
> system (TeX/LaTeX would be my choice) to do what that is good at. If
> another system has already solved problems like footnotes and floating
> figures, what exactly is the drive to reinvent that in the browser?
>

I am myself a LaTeX person and for a lot of things I would agree with you.

However, there are some good reasons to do everything in browsers:

A) You can have one source file for everything and don't need to do
conversion

B) Epub is already tied to HTML, sousing LaTeX as the universal format will
likely not work in the long run

C) Most people have a browser installed already, so you don't need to have
them install anything else on their machine

D) Browsers running extra layout JavaScript can be made to render more or
less complex layout of the same sources. So far example you may say that
you just want to show the text and put the footnotes at the bottom in a
single parse. The layout will not be perfect, but on a mobile device that
will give you a quick result. But on a server that is to produce a PDF out
of the same source document, you can have it use a 7-parse process and add
kerning, microtyping, etc.

E) LaTeX document editing is not exactly easy. Many of the LaTeX documents
I wrote 10-15 years ago I cannot simply parse using my current laptop with
the latest TeXLive installed. And most of those are just 5-10 page long
midterm papers for History, Literature or English language (so no advanced
formulas, just citations and plain text). For my books I tried to add a few
minor extras (such as a small flag icon that would be added before and
after the chapter titles), and when I need to rerender them after not
having rendered them for a year or two, I generally have to spend about a
day on various online discussion forums to try to figure out what has
changed in the latest versions of the renderers and how I can get around
those issues. I am not entirely sure, but I imagine that this would have
been easier had the sources been in HTML, as the renderer would at least
render everything that it did understand instead of the everything or
nothing approach of LaTeX.

I wonder if point D is entirely clear to everyone. When CSS features are
discussed, one of the most important points is of course whether browsers
will implement them. Features that are so complex that the rendering of the
contents of a page will take as long as it takes for a LaTeX renderer to
create a PDF will likely not make it, because speed is more important that
high feature level for browsers for which pages-based features are just a
side project. But some will need such complexity for rendering really great
looking output (for example for print output).

>From browsers probably the best one can ever expect is that they will
provide fast and simple page layout. But if one has the needed primitives
to allow for more complex solutions in browsers using JavaScript, then one
can still create those sites that spend 5 minutes on rendering the final
output.



On Tue, Aug 4, 2015 at 8:03 PM, Kaveh Bazargan <
kaveh@rivervalleytechnologies.com> wrote:

>
>
> On 4 August 2015 at 18:50, Bill Kasdorf <bkasdorf@apexcovantage.com>
>  wrote:
>
>> A quick clarification. I am quite sure that in her e-mail Deborah is
>> using the term "pagination" to mean "maintaining a record in the digital
>> file of where the page breaks occur in the paginated version of record."
>> That's essential to accessibility and other useful things as well
>> (citations, cross references, indexes, etc. in a world in which print is
>> still considered the version of record and references to its page breaks
>> are common.) That's not the same as making the _*rendered pages*_ in the
>> digital file replicate those in the print.—Bill K
>>
>>
>> [...]
>>
>>
>>
> But Bill, how do we make the page breaks in the electronic version to be
> the same as those of the print pages unless we have the same elements and
> layout? For instance if a floating figure is missing from an electronic
> page, do we just make a short page and break where the paper copy breaks?
> That would lead to very ugly results.
>


The end device should be able to both figure out what page numbers would be
in the normal sized output AND what it is on the actual device. All without
having to add extra meta data about where non-explicit page break occur.

So basically it renders the pages twice:

A) Once in the original size. This can be done in a way so the end user
doesn't actually have to see it. The page numbers are retrieved from this
version. A could be made to be exactly equal to the print version (or the
other way round: in order to create the print version, one simply prints
out A).

B) A second time for the user to see it in the size appropriate for the
zoom level and  screen size.

There are various ways this could be presented to the user in the User
Interface. For example the "Jump to page number" function could be using
the page numbers retrieved from A but then jump to the correct location in
B. And the page numbers shown in the corner of the pages could also be the
ones retrieved from A (that would mean several pages in a row could be
displayed with the same page number and one B page could have two page
numbers if it happens to span over the break between two A pages.
Received on Wednesday, 5 August 2015 22:35:40 UTC