Re: Prioritisation from Kaveh Bazargan on 2015-08-06 (public-digipub@w3.org from August 2015)

From: Kaveh Bazargan <kaveh@rivervalleytechnologies.com>
Date: Thu, 6 Aug 2015 16:40:01 +0100
To: "Siegman, Tzviya - Hoboken" <tsiegman@wiley.com>
Cc: Bill Kasdorf <bkasdorf@apexcovantage.com>, Johannes Wilm <johanneswilm@vivliostyle.com>, Dave Cramer <dauwhe@gmail.com>, Richard Ishida <ishida@w3.org>, W3C Digital Publishing Discussion list <public-digipub@w3.org>
Message-ID: <CAJ2R9pj2omU3MBDBzMeqizsNBiMV1VpChRBrLWQHWVJhX0gP_w@mail.gmail.com>
Thanks for teaching me a new word: proselytize :-)

I don't think you have quite grasped what I am saying, but I get the
message. ;-)

On 6 August 2015 at 16:25, Siegman, Tzviya - Hoboken <tsiegman@wiley.com>
wrote:

> While it is great to see so much discussion on our list, I feel that we
> are having a religious argument not a technical one.
>
>
>
> Many people are happy to use TeX, and that is fine. Many people would like
> to shift their workflows to HTML and CSS, and that is what we are focusing
> on in the DPUB IG and CSS WG.
>
>
>
> Let’s try not to proselytize.
>
>
>
>
>
> *Tzviya Siegman*
>
> Digital Book Standards & Capabilities Lead
>
> Wiley
>
> 201-748-6884
>
> tsiegman@wiley.com
>
>
>
> *From:* Kaveh Bazargan [mailto:kaveh@rivervalleytechnologies.com]
> *Sent:* Thursday, August 06, 2015 11:16 AM
> *To:* Bill Kasdorf
> *Cc:* Johannes Wilm; Dave Cramer; Richard Ishida; W3C Digital Publishing
> Discussion list
> *Subject:* Re: Prioritisation
>
>
>
> Hi Bill
>
>
>
> As far as I know TeX is the only *open and free* engine that can handle
> sophisticated pagination of complex text (e.g. floating elements, footnotes
> that take more more than body text, complex math, highest level of
> typography) with full automation.
>
>
>
> And I know I am missing something but I fail to see why we want to go
> through several years of development to replicate the above
> functionalities, inside the browser. (A back end TeX system converting XML
> to PDF needs no maintenance or learning).
>
>
>
> On 6 August 2015 at 15:53, Bill Kasdorf <bkasdorf@apexcovantage.com>
> wrote:
>
> Just pointing out that there are many sophisticated pagination engines out
> there, TeX is just one example. Very sophisticated, automated, complex page
> makeup has been done on proprietary systems since the 1990s. Several such
> systems are still currently in wide use. Many of the issues that we're
> addressing in the context of the Open Web Platform today were considered
> solved problems decades ago in those systems. That doesn't mean we don't
> want to be able to provide that kind of sophisticated page makeup natively
> on the Web and in Web-based technologies, _*without*_ requiring separate
> software systems (with their attendant specializations, learning curves,
> system implementations, maintenance, etc.)—Bill Kasdorf
>
>
>
> *From:* Kaveh Bazargan [mailto:kaveh@rivervalleytechnologies.com]
> *Sent:* Thursday, August 06, 2015 7:06 AM
> *To:* Johannes Wilm
> *Cc:* Dave Cramer; Richard Ishida; W3C Digital Publishing Discussion list
> *Subject:* Re: Prioritisation
>
>
>
> Hi Johannes
>
>
>
> I am flattered by your comprehensive reply. My comments regarding TeX are
> below, but I might not have explained myself well...
>
>
>
> I am not suggesting anyone should use TeX code, or even be aware that
> TeX/LaTeX is involved. The point is that it is a back end automated page
> make up engine. So XML/HTML can be converted to PDF very fast and at very
> high quality with the TeX engine invisibly doing the work.
>
>
>
> Here are my points, distilled:
>
>    - I like the idea of HTML/CSS/Javascript creating fixed pages to be
>    read on screen with all kinds of interactivity
>    - I still question trying to create footnotes, floating figures and
>    tables, and typographic niceties which have primarily evolved for print on
>    paper, being done in the browser. To me, floating items only apply to
>    print, so no interactivity is not needed. Why not pass the info to an
>    engine that knows how to do it well?
>    - The problem of floating items, complex math, large footnotes that
>    need to break across pages, and many other complex pagination problems have
>    already been solved in TeX. These are not trivial problems and I worry
>    about this working group reinventing the wheel, by starting to specify the
>    basics of pagination from scratch. In my opinion, in the end the only way
>    to solve the problem is to rewrite TeX in JavaScript!
>    - Another problem I have is holding all our information in HTML as
>    opposed to XML. I worry about how clean and semantic the content will be.
>    after all HTML was designed to be forgiving, so even bad content will look
>    good. We are all excited about the amazing gizmos in html and how the
>    browser is the new publishing model, but what about 10, 50 or 100 years
>    time? Will these html files still make sense? What happens when the browser
>    is superseded? I am all for html tools and interactivity, but I suggest the
>    definitive content should be XML, not HTML.
>
>
>
> On 5 August 2015 at 23:34, Johannes Wilm <johanneswilm@vivliostyle.com>
> wrote:
>
> Kaveh's email just reach me now, so I have only seen other parts of the
> discussion so far.
>
>
>
> On Tue, Aug 4, 2015 at 5:55 PM, Kaveh Bazargan <
> kaveh@rivervalleytechnologies.com> wrote:
>
> Forgive me for a very basic question, but it is a devil's advocate type of
> question. And if this is not the place to ask this perhaps you can direct
> me to any relevant discussions.
>
>
>
> My very basic question is, why do we need to "paginate" in the browser in
> the first place? Why not keep the browser for reflowing and interactive
> text, which is what it is good at, and use a standard mark-up pagination
> system (TeX/LaTeX would be my choice) to do what that is good at. If
> another system has already solved problems like footnotes and floating
> figures, what exactly is the drive to reinvent that in the browser?
>
>
>
> I am myself a LaTeX person and for a lot of things I would agree with you.
>
>
>
> However, there are some good reasons to do everything in browsers:
>
>
>
> A) You can have one source file for everything and don't need to do
> conversion
>
>
>
> B) Epub is already tied to HTML, sousing LaTeX as the universal format
> will likely not work in the long run
>
>
>
> C) Most people have a browser installed already, so you don't need to have
> them install anything else on their machine
>
>
>
> D) Browsers running extra layout JavaScript can be made to render more or
> less complex layout of the same sources. So far example you may say that
> you just want to show the text and put the footnotes at the bottom in a
> single parse. The layout will not be perfect, but on a mobile device that
> will give you a quick result. But on a server that is to produce a PDF out
> of the same source document, you can have it use a 7-parse process and add
> kerning, microtyping, etc.
>
>
>
> E) LaTeX document editing is not exactly easy. Many of the LaTeX documents
> I wrote 10-15 years ago I cannot simply parse using my current laptop with
> the latest TeXLive installed. And most of those are just 5-10 page long
> midterm papers for History, Literature or English language (so no advanced
> formulas, just citations and plain text). For my books I tried to add a few
> minor extras (such as a small flag icon that would be added before and
> after the chapter titles), and when I need to rerender them after not
> having rendered them for a year or two, I generally have to spend about a
> day on various online discussion forums to try to figure out what has
> changed in the latest versions of the renderers and how I can get around
> those issues. I am not entirely sure, but I imagine that this would have
> been easier had the sources been in HTML, as the renderer would at least
> render everything that it did understand instead of the everything or
> nothing approach of LaTeX.
>
>
>
> Actually TeX is the fastest page renderer. Standard TeX files create pages
> at over 100 pages a second on a normal laptop, including complex math and
> footnotes. And I am surprised you had problem running old files. You must
> have been using style files which had not been maintained. The TeX engine
> has been frozen for 30 years!
>
>
>
> But for this discussion most of that is irrelevant I think.
>
>
>
>
>
> I wonder if point D is entirely clear to everyone. When CSS features are
> discussed, one of the most important points is of course whether browsers
> will implement them. Features that are so complex that the rendering of the
> contents of a page will take as long as it takes for a LaTeX renderer to
> create a PDF will likely not make it, because speed is more important that
> high feature level for browsers for which pages-based features are just a
> side project. But some will need such complexity for rendering really great
> looking output (for example for print output).
>
>
>
> From browsers probably the best one can ever expect is that they will
> provide fast and simple page layout. But if one has the needed primitives
> to allow for more complex solutions in browsers using JavaScript, then one
> can still create those sites that spend 5 minutes on rendering the final
> output.
>
>
>
>
>
> On Tue, Aug 4, 2015 at 8:03 PM, Kaveh Bazargan <
> kaveh@rivervalleytechnologies.com> wrote:
>
>
>
>
>
> On 4 August 2015 at 18:50, Bill Kasdorf <bkasdorf@apexcovantage.com
> > wrote:
>
> A quick clarification. I am quite sure that in her e-mail Deborah is using
> the term "pagination" to mean "maintaining a record in the digital file of
> where the page breaks occur in the paginated version of record." That's
> essential to accessibility and other useful things as well (citations,
> cross references, indexes, etc. in a world in which print is still
> considered the version of record and references to its page breaks are
> common.) That's not the same as making the _*rendered pages*_ in the
> digital file replicate those in the print.—Bill K
>
>
>
> [...]
>
>
>
>
>
> But Bill, how do we make the page breaks in the electronic version to be
> the same as those of the print pages unless we have the same elements and
> layout? For instance if a floating figure is missing from an electronic
> page, do we just make a short page and break where the paper copy breaks?
> That would lead to very ugly results.
>
>
>
>
>
> The end device should be able to both figure out what page numbers would
> be in the normal sized output AND what it is on the actual device. All
> without having to add extra meta data about where non-explicit page break
> occur.
>
> So basically it renders the pages twice:
>
> A) Once in the original size. This can be done in a way so the end user
> doesn't actually have to see it. The page numbers are retrieved from this
> version. A could be made to be exactly equal to the print version (or the
> other way round: in order to create the print version, one simply prints
> out A).
>
> B) A second time for the user to see it in the size appropriate for the
> zoom level and  screen size.
>
> There are various ways this could be presented to the user in the User
> Interface. For example the "Jump to page number" function could be using
> the page numbers retrieved from A but then jump to the correct location in
> B. And the page numbers shown in the corner of the pages could also be the
> ones retrieved from A (that would mean several pages in a row could be
> displayed with the same page number and one B page could have two page
> numbers if it happens to span over the break between two A pages.
>
>
>
>
>
>
>
> --
>
> Kaveh Bazargan
>
> Director
>
> River Valley Technologies
>
> @kaveh1000
> +44 7771 824 111
>
> www.rivervalleytechnologies.com
>
> www.bazargan.org
>
>
>
>
>
> --
>
> Kaveh Bazargan
>
> Director
>
> River Valley Technologies
>
> @kaveh1000
> +44 7771 824 111
>
> www.rivervalleytechnologies.com
>
> www.bazargan.org
>



-- 
Kaveh Bazargan
Director
River Valley Technologies
@kaveh1000
+44 7771 824 111
www.rivervalleytechnologies.com
www.bazargan.org
Received on Thursday, 6 August 2015 15:40:50 UTC