W3C home > Mailing lists > Public > public-digipub@w3.org > August 2015

Re: Prioritisation

From: Kaveh Bazargan <kaveh@rivervalleytechnologies.com>
Date: Thu, 6 Aug 2015 12:05:42 +0100
Message-ID: <CAJ2R9pjCRiX22v_=KsvAeWxUT=b_TFypL49PzL81Ckg-1kBhaA@mail.gmail.com>
To: Johannes Wilm <johanneswilm@vivliostyle.com>
Cc: Dave Cramer <dauwhe@gmail.com>, Richard Ishida <ishida@w3.org>, W3C Digital Publishing Discussion list <public-digipub@w3.org>
Hi Johannes

I am flattered by your comprehensive reply. My comments regarding TeX are
below, but I might not have explained myself well...

I am not suggesting anyone should use TeX code, or even be aware that
TeX/LaTeX is involved. The point is that it is a back end automated page
make up engine. So XML/HTML can be converted to PDF very fast and at very
high quality with the TeX engine invisibly doing the work.

Here are my points, distilled:

   - I like the idea of HTML/CSS/Javascript creating fixed pages to be read
   on screen with all kinds of interactivity
   - I still question trying to create footnotes, floating figures and
   tables, and typographic niceties which have primarily evolved for print on
   paper, being done in the browser. To me, floating items only apply to
   print, so no interactivity is not needed. Why not pass the info to an
   engine that knows how to do it well?
   - The problem of floating items, complex math, large footnotes that need
   to break across pages, and many other complex pagination problems have
   already been solved in TeX. These are not trivial problems and I worry
   about this working group reinventing the wheel, by starting to specify the
   basics of pagination from scratch. In my opinion, in the end the only way
   to solve the problem is to rewrite TeX in JavaScript!
   - Another problem I have is holding all our information in HTML as
   opposed to XML. I worry about how clean and semantic the content will be.
   after all HTML was designed to be forgiving, so even bad content will look
   good. We are all excited about the amazing gizmos in html and how the
   browser is the new publishing model, but what about 10, 50 or 100 years
   time? Will these html files still make sense? What happens when the browser
   is superseded? I am all for html tools and interactivity, but I suggest the
   definitive content should be XML, not HTML.

On 5 August 2015 at 23:34, Johannes Wilm <johanneswilm@vivliostyle.com>

> Kaveh's email just reach me now, so I have only seen other parts of the
> discussion so far.
> On Tue, Aug 4, 2015 at 5:55 PM, Kaveh Bazargan <
> kaveh@rivervalleytechnologies.com> wrote:
>> Forgive me for a very basic question, but it is a devil's advocate type
>> of question. And if this is not the place to ask this perhaps you can
>> direct me to any relevant discussions.
>> My very basic question is, why do we need to "paginate" in the browser in
>> the first place? Why not keep the browser for reflowing and interactive
>> text, which is what it is good at, and use a standard mark-up pagination
>> system (TeX/LaTeX would be my choice) to do what that is good at. If
>> another system has already solved problems like footnotes and floating
>> figures, what exactly is the drive to reinvent that in the browser?
> I am myself a LaTeX person and for a lot of things I would agree with you.
> However, there are some good reasons to do everything in browsers:
> A) You can have one source file for everything and don't need to do
> conversion
> B) Epub is already tied to HTML, sousing LaTeX as the universal format
> will likely not work in the long run
> C) Most people have a browser installed already, so you don't need to have
> them install anything else on their machine
> D) Browsers running extra layout JavaScript can be made to render more or
> less complex layout of the same sources. So far example you may say that
> you just want to show the text and put the footnotes at the bottom in a
> single parse. The layout will not be perfect, but on a mobile device that
> will give you a quick result. But on a server that is to produce a PDF out
> of the same source document, you can have it use a 7-parse process and add
> kerning, microtyping, etc.
> E) LaTeX document editing is not exactly easy. Many of the LaTeX documents
> I wrote 10-15 years ago I cannot simply parse using my current laptop with
> the latest TeXLive installed. And most of those are just 5-10 page long
> midterm papers for History, Literature or English language (so no advanced
> formulas, just citations and plain text). For my books I tried to add a few
> minor extras (such as a small flag icon that would be added before and
> after the chapter titles), and when I need to rerender them after not
> having rendered them for a year or two, I generally have to spend about a
> day on various online discussion forums to try to figure out what has
> changed in the latest versions of the renderers and how I can get around
> those issues. I am not entirely sure, but I imagine that this would have
> been easier had the sources been in HTML, as the renderer would at least
> render everything that it did understand instead of the everything or
> nothing approach of LaTeX.

Actually TeX is the fastest page renderer. Standard TeX files create pages
at over 100 pages a second on a normal laptop, including complex math and
footnotes. And I am surprised you had problem running old files. You must
have been using style files which had not been maintained. The TeX engine
has been frozen for 30 years!

But for this discussion most of that is irrelevant I think.

> I wonder if point D is entirely clear to everyone. When CSS features are
> discussed, one of the most important points is of course whether browsers
> will implement them. Features that are so complex that the rendering of the
> contents of a page will take as long as it takes for a LaTeX renderer to
> create a PDF will likely not make it, because speed is more important that
> high feature level for browsers for which pages-based features are just a
> side project. But some will need such complexity for rendering really great
> looking output (for example for print output).
> From browsers probably the best one can ever expect is that they will
> provide fast and simple page layout. But if one has the needed primitives
> to allow for more complex solutions in browsers using JavaScript, then one
> can still create those sites that spend 5 minutes on rendering the final
> output.
> On Tue, Aug 4, 2015 at 8:03 PM, Kaveh Bazargan <
> kaveh@rivervalleytechnologies.com> wrote:
>> On 4 August 2015 at 18:50, Bill Kasdorf <bkasdorf@apexcovantage.com>
>>  wrote:
>>> A quick clarification. I am quite sure that in her e-mail Deborah is
>>> using the term "pagination" to mean "maintaining a record in the digital
>>> file of where the page breaks occur in the paginated version of record."
>>> That's essential to accessibility and other useful things as well
>>> (citations, cross references, indexes, etc. in a world in which print is
>>> still considered the version of record and references to its page breaks
>>> are common.) That's not the same as making the _*rendered pages*_ in
>>> the digital file replicate those in the print.—Bill K
>>> [...]
>> But Bill, how do we make the page breaks in the electronic version to be
>> the same as those of the print pages unless we have the same elements and
>> layout? For instance if a floating figure is missing from an electronic
>> page, do we just make a short page and break where the paper copy breaks?
>> That would lead to very ugly results.
> The end device should be able to both figure out what page numbers would
> be in the normal sized output AND what it is on the actual device. All
> without having to add extra meta data about where non-explicit page break
> occur.
> So basically it renders the pages twice:
> A) Once in the original size. This can be done in a way so the end user
> doesn't actually have to see it. The page numbers are retrieved from this
> version. A could be made to be exactly equal to the print version (or the
> other way round: in order to create the print version, one simply prints
> out A).
> B) A second time for the user to see it in the size appropriate for the
> zoom level and  screen size.
> There are various ways this could be presented to the user in the User
> Interface. For example the "Jump to page number" function could be using
> the page numbers retrieved from A but then jump to the correct location in
> B. And the page numbers shown in the corner of the pages could also be the
> ones retrieved from A (that would mean several pages in a row could be
> displayed with the same page number and one B page could have two page
> numbers if it happens to span over the break between two A pages.

Kaveh Bazargan
River Valley Technologies
+44 7771 824 111
Received on Thursday, 6 August 2015 11:06:31 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 6 August 2015 11:06:32 UTC