Re: Prioritisation from Kaveh Bazargan on 2015-08-06 (public-digipub@w3.org from August 2015)

From: Kaveh Bazargan <kaveh@rivervalleytechnologies.com>
Date: Thu, 6 Aug 2015 16:16:27 +0100
To: Bill Kasdorf <bkasdorf@apexcovantage.com>
Cc: Johannes Wilm <johanneswilm@vivliostyle.com>, Dave Cramer <dauwhe@gmail.com>, Richard Ishida <ishida@w3.org>, W3C Digital Publishing Discussion list <public-digipub@w3.org>
Message-ID: <CAJ2R9pgUXL0ok6YdTXZuDVkqhavM-XaEz91E0Kpvom5JwJpBng@mail.gmail.com>
Hi Bill

As far as I know TeX is the only *open and free* engine that can handle
sophisticated pagination of complex text (e.g. floating elements, footnotes
that take more more than body text, complex math, highest level of
typography) with full automation.

And I know I am missing something but I fail to see why we want to go
through several years of development to replicate the above
functionalities, inside the browser. (A back end TeX system converting XML
to PDF needs no maintenance or learning).

On 6 August 2015 at 15:53, Bill Kasdorf <bkasdorf@apexcovantage.com> wrote:

> Just pointing out that there are many sophisticated pagination engines out
> there, TeX is just one example. Very sophisticated, automated, complex page
> makeup has been done on proprietary systems since the 1990s. Several such
> systems are still currently in wide use. Many of the issues that we're
> addressing in the context of the Open Web Platform today were considered
> solved problems decades ago in those systems. That doesn't mean we don't
> want to be able to provide that kind of sophisticated page makeup natively
> on the Web and in Web-based technologies, _*without*_ requiring separate
> software systems (with their attendant specializations, learning curves,
> system implementations, maintenance, etc.)—Bill Kasdorf
>
>
>
> *From:* Kaveh Bazargan [mailto:kaveh@rivervalleytechnologies.com]
> *Sent:* Thursday, August 06, 2015 7:06 AM
> *To:* Johannes Wilm
> *Cc:* Dave Cramer; Richard Ishida; W3C Digital Publishing Discussion list
> *Subject:* Re: Prioritisation
>
>
>
> Hi Johannes
>
>
>
> I am flattered by your comprehensive reply. My comments regarding TeX are
> below, but I might not have explained myself well...
>
>
>
> I am not suggesting anyone should use TeX code, or even be aware that
> TeX/LaTeX is involved. The point is that it is a back end automated page
> make up engine. So XML/HTML can be converted to PDF very fast and at very
> high quality with the TeX engine invisibly doing the work.
>
>
>
> Here are my points, distilled:
>
>    - I like the idea of HTML/CSS/Javascript creating fixed pages to be
>    read on screen with all kinds of interactivity
>    - I still question trying to create footnotes, floating figures and
>    tables, and typographic niceties which have primarily evolved for print on
>    paper, being done in the browser. To me, floating items only apply to
>    print, so no interactivity is not needed. Why not pass the info to an
>    engine that knows how to do it well?
>    - The problem of floating items, complex math, large footnotes that
>    need to break across pages, and many other complex pagination problems have
>    already been solved in TeX. These are not trivial problems and I worry
>    about this working group reinventing the wheel, by starting to specify the
>    basics of pagination from scratch. In my opinion, in the end the only way
>    to solve the problem is to rewrite TeX in JavaScript!
>    - Another problem I have is holding all our information in HTML as
>    opposed to XML. I worry about how clean and semantic the content will be.
>    after all HTML was designed to be forgiving, so even bad content will look
>    good. We are all excited about the amazing gizmos in html and how the
>    browser is the new publishing model, but what about 10, 50 or 100 years
>    time? Will these html files still make sense? What happens when the browser
>    is superseded? I am all for html tools and interactivity, but I suggest the
>    definitive content should be XML, not HTML.
>
>
>
> On 5 August 2015 at 23:34, Johannes Wilm <johanneswilm@vivliostyle.com>
> wrote:
>
> Kaveh's email just reach me now, so I have only seen other parts of the
> discussion so far.
>
>
>
> On Tue, Aug 4, 2015 at 5:55 PM, Kaveh Bazargan <
> kaveh@rivervalleytechnologies.com> wrote:
>
> Forgive me for a very basic question, but it is a devil's advocate type of
> question. And if this is not the place to ask this perhaps you can direct
> me to any relevant discussions.
>
>
>
> My very basic question is, why do we need to "paginate" in the browser in
> the first place? Why not keep the browser for reflowing and interactive
> text, which is what it is good at, and use a standard mark-up pagination
> system (TeX/LaTeX would be my choice) to do what that is good at. If
> another system has already solved problems like footnotes and floating
> figures, what exactly is the drive to reinvent that in the browser?
>
>
>
> I am myself a LaTeX person and for a lot of things I would agree with you.
>
>
>
> However, there are some good reasons to do everything in browsers:
>
>
>
> A) You can have one source file for everything and don't need to do
> conversion
>
>
>
> B) Epub is already tied to HTML, sousing LaTeX as the universal format
> will likely not work in the long run
>
>
>
> C) Most people have a browser installed already, so you don't need to have
> them install anything else on their machine
>
>
>
> D) Browsers running extra layout JavaScript can be made to render more or
> less complex layout of the same sources. So far example you may say that
> you just want to show the text and put the footnotes at the bottom in a
> single parse. The layout will not be perfect, but on a mobile device that
> will give you a quick result. But on a server that is to produce a PDF out
> of the same source document, you can have it use a 7-parse process and add
> kerning, microtyping, etc.
>
>
>
> E) LaTeX document editing is not exactly easy. Many of the LaTeX documents
> I wrote 10-15 years ago I cannot simply parse using my current laptop with
> the latest TeXLive installed. And most of those are just 5-10 page long
> midterm papers for History, Literature or English language (so no advanced
> formulas, just citations and plain text). For my books I tried to add a few
> minor extras (such as a small flag icon that would be added before and
> after the chapter titles), and when I need to rerender them after not
> having rendered them for a year or two, I generally have to spend about a
> day on various online discussion forums to try to figure out what has
> changed in the latest versions of the renderers and how I can get around
> those issues. I am not entirely sure, but I imagine that this would have
> been easier had the sources been in HTML, as the renderer would at least
> render everything that it did understand instead of the everything or
> nothing approach of LaTeX.
>
>
>
> Actually TeX is the fastest page renderer. Standard TeX files create pages
> at over 100 pages a second on a normal laptop, including complex math and
> footnotes. And I am surprised you had problem running old files. You must
> have been using style files which had not been maintained. The TeX engine
> has been frozen for 30 years!
>
>
>
> But for this discussion most of that is irrelevant I think.
>
>
>
>
>
> I wonder if point D is entirely clear to everyone. When CSS features are
> discussed, one of the most important points is of course whether browsers
> will implement them. Features that are so complex that the rendering of the
> contents of a page will take as long as it takes for a LaTeX renderer to
> create a PDF will likely not make it, because speed is more important that
> high feature level for browsers for which pages-based features are just a
> side project. But some will need such complexity for rendering really great
> looking output (for example for print output).
>
>
>
> From browsers probably the best one can ever expect is that they will
> provide fast and simple page layout. But if one has the needed primitives
> to allow for more complex solutions in browsers using JavaScript, then one
> can still create those sites that spend 5 minutes on rendering the final
> output.
>
>
>
>
>
> On Tue, Aug 4, 2015 at 8:03 PM, Kaveh Bazargan <
> kaveh@rivervalleytechnologies.com> wrote:
>
>
>
>
>
> On 4 August 2015 at 18:50, Bill Kasdorf <bkasdorf@apexcovantage.com
> > wrote:
>
> A quick clarification. I am quite sure that in her e-mail Deborah is using
> the term "pagination" to mean "maintaining a record in the digital file of
> where the page breaks occur in the paginated version of record." That's
> essential to accessibility and other useful things as well (citations,
> cross references, indexes, etc. in a world in which print is still
> considered the version of record and references to its page breaks are
> common.) That's not the same as making the _*rendered pages*_ in the
> digital file replicate those in the print.—Bill K
>
>
>
> [...]
>
>
>
>
>
> But Bill, how do we make the page breaks in the electronic version to be
> the same as those of the print pages unless we have the same elements and
> layout? For instance if a floating figure is missing from an electronic
> page, do we just make a short page and break where the paper copy breaks?
> That would lead to very ugly results.
>
>
>
>
>
> The end device should be able to both figure out what page numbers would
> be in the normal sized output AND what it is on the actual device. All
> without having to add extra meta data about where non-explicit page break
> occur.
>
> So basically it renders the pages twice:
>
> A) Once in the original size. This can be done in a way so the end user
> doesn't actually have to see it. The page numbers are retrieved from this
> version. A could be made to be exactly equal to the print version (or the
> other way round: in order to create the print version, one simply prints
> out A).
>
> B) A second time for the user to see it in the size appropriate for the
> zoom level and  screen size.
>
> There are various ways this could be presented to the user in the User
> Interface. For example the "Jump to page number" function could be using
> the page numbers retrieved from A but then jump to the correct location in
> B. And the page numbers shown in the corner of the pages could also be the
> ones retrieved from A (that would mean several pages in a row could be
> displayed with the same page number and one B page could have two page
> numbers if it happens to span over the break between two A pages.
>
>
>
>
>
>
>
> --
>
> Kaveh Bazargan
>
> Director
>
> River Valley Technologies
>
> @kaveh1000
> +44 7771 824 111
>
> www.rivervalleytechnologies.com
>
> www.bazargan.org
>



-- 
Kaveh Bazargan
Director
River Valley Technologies
@kaveh1000
+44 7771 824 111
www.rivervalleytechnologies.com
www.bazargan.org
Received on Thursday, 6 August 2015 15:17:17 UTC