W3C home > Mailing lists > Public > public-digipub@w3.org > August 2015

Re: Prioritisation

From: Johannes Wilm <johanneswilm@vivliostyle.com>
Date: Thu, 6 Aug 2015 15:58:54 +0200
Message-ID: <CABkgm-Rok6ZNSf1_FhasviS6-Hvory1jkNv2Wqs6ncXVsqGP=g@mail.gmail.com>
To: Kaveh Bazargan <kaveh@rivervalleytechnologies.com>
Cc: Dave Cramer <dauwhe@gmail.com>, Richard Ishida <ishida@w3.org>, W3C Digital Publishing Discussion list <public-digipub@w3.org>
On Thu, Aug 6, 2015 at 1:05 PM, Kaveh Bazargan <
kaveh@rivervalleytechnologies.com> wrote:

> Hi Johannes
>
> I am flattered by your comprehensive reply. My comments regarding TeX are
> below, but I might not have explained myself well...
>
> I am not suggesting anyone should use TeX code, or even be aware that
> TeX/LaTeX is involved. The point is that it is a back end automated page
> make up engine. So XML/HTML can be converted to PDF very fast and at very
> high quality with the TeX engine invisibly doing the work.
>

Ok, so you are proposing converting XML to LaTeX and Epub/HTML on a backend
system? The main problem with that is the conversion mechanisms just about
always need human intervention and that it's hard to impossible to get XML
input files from authors.


>
> Here are my points, distilled:
>
>    - I like the idea of HTML/CSS/Javascript creating fixed pages to be
>    read on screen with all kinds of interactivity
>    - I still question trying to create footnotes, floating figures and
>    tables, and typographic niceties which have primarily evolved for print on
>    paper, being done in the browser. To me, floating items only apply to
>    print, so no interactivity is not needed. Why not pass the info to an
>    engine that knows how to do it well?
>
> Is there not also a point in having footnotes and floating figures in
ebooks (and have those still work when the user changes the font size
level)?

>
>    - The problem of floating items, complex math, large footnotes that
>    need to break across pages, and many other complex pagination problems have
>    already been solved in TeX. These are not trivial problems and I worry
>    about this working group reinventing the wheel, by starting to specify the
>    basics of pagination from scratch. In my opinion, in the end the only way
>    to solve the problem is to rewrite TeX in JavaScript!
>
>
I have also been thinking of LaTeX in Javascript. But as far as I can tell,
that in 2015 that would still be too slow. TeXLive is a few GB large, and
if the user should wait for a few GB to download before the page is
rendered, that likely wouldn't work. In a few years, when a few GB is
nothing and processors are faster, this may be a viable alternative.

As for these other items: I agree, they are quite complex. But it should be
possible to do an "simple version" of most of those features that renders
quickly, and then a more complex that will create even smoother designs for
the situations when one has time.

>
>    - Another problem I have is holding all our information in HTML as
>    opposed to XML. I worry about how clean and semantic the content will be.
>    after all HTML was designed to be forgiving, so even bad content will look
>    good. We are all excited about the amazing gizmos in html and how the
>    browser is the new publishing model, but what about 10, 50 or 100 years
>    time? Will these html files still make sense? What happens when the browser
>    is superseded? I am all for html tools and interactivity, but I suggest the
>    definitive content should be XML, not HTML.
>
> Good question. I would guess that it depends on what features. H1-H6 and P
elements will likely still be readable for a long time. Also XMl files will
be readable, but who can turn them into something visually attractive?
Alreayd now there is a lack of a perfect WYSIWYG XML-editor and as we can
see with XSL-FO, the future of turning XML into PDFs via a common standard
is not secure either. And we are just a handful of years after "peak XML
standardization" and likely still haven't reached "peak XML usage". So how
about in 200 years?

The situation with LaTeX is somewhat similar (see below). Noone can quite
know what will happen and which standard survives, but with HTML we at
least know that the number of users is extremely high so that there is a
certain chance that those files will survive for a good while. That being
said, the safest is probably to store files in several formats for
long-term storage. In Fidus Writer we therefore used both simple HTML and
simple LaTeX as storage formats for user content.


> Actually TeX is the fastest page renderer. Standard TeX files create pages
> at over 100 pages a second on a normal laptop, including complex math and
> footnotes. And I am surprised you had problem running old files. You must
> have been using style files which had not been maintained. The TeX engine
> has been frozen for 30 years!
>
>
Yes, and that's why I thought LaTeX was a great idea som 15 years ago. But
then I suddenly had to open some files from 1996 in 2003 created by someone
else, and I spent about a week figuring out how to rewrite the macros and
going through the files by hand fixing small things that had changed.
SHortly thereafter I received a file that was just a few months old, but
had been written on a Mac (I was running Linux) and the line endings were
different so I had to figure out how to convert them.

Then along came direct support for other characters without using
shorthands, by using XeTeX and support for ttf fonts in LuaTeX, just in
slightly different ways. And again lots of stuff needed to be changed by
hand or by a script which I would spend up to a few days developing, etc. .
Then suddenly the maintainer of the main bibliography package I had been
using, biblatex, disappeared into thin air. Others eventually tracked him
down and took over package maintainership.

And even a few months ago, when I acquired a new laptop with a new version
of Linux Mint, I couldn't just use my CV compiler[1] as I used to, because
the version of TeXLive that is available in the package manager has some
bugs that have been fixed upstream but haven't yet been fixed in the
version available in the package maintainer, so I needed to add some extra
lines of code I found on some random website.

Different than support on HTML, which can easily be found in books and
online documents, LaTeX hep can mostly be found in obscure places as the
information about how to do one particular thing correctly at times only
exists in the head of 2-3 developers worldwide.

Of course it has always been possible to get the content and with some time
spend on the internet in forums, it's always fixable in the end. For a big
organization that can afford a development team of 5-6 people who can spend
all their time on this, this is likely no big deal to pay for such
conversions. But the question is of course if it will continue to be
developed, or if not at some stage too many say "well, latex is really good
looking, but HTML can do just about all of it, and it's a lot easier for me
to understand how to modify it, so I'll stick with that". At least for now
it looks like HTML has the advantage in numbers.
Received on Thursday, 6 August 2015 13:59:33 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 6 August 2015 13:59:33 UTC