W3C home > Mailing lists > Public > public-digipub@w3.org > August 2015

Re: Prioritisation

From: Kaveh Bazargan <kaveh@rivervalleytechnologies.com>
Date: Thu, 6 Aug 2015 15:49:47 +0100
Message-ID: <CAJ2R9pgO-j9bxMTrTXw=rgsEW-zJzVbZjGUy1kPTSK+Xpb7=EA@mail.gmail.com>
To: Johannes Wilm <johanneswilm@vivliostyle.com>
Cc: Dave Cramer <dauwhe@gmail.com>, Richard Ishida <ishida@w3.org>, W3C Digital Publishing Discussion list <public-digipub@w3.org>
On 6 August 2015 at 14:58, Johannes Wilm <johanneswilm@vivliostyle.com>

> On Thu, Aug 6, 2015 at 1:05 PM, Kaveh Bazargan <
> kaveh@rivervalleytechnologies.com> wrote:
>> Hi Johannes
>> I am flattered by your comprehensive reply. My comments regarding TeX are
>> below, but I might not have explained myself well...
>> I am not suggesting anyone should use TeX code, or even be aware that
>> TeX/LaTeX is involved. The point is that it is a back end automated page
>> make up engine. So XML/HTML can be converted to PDF very fast and at very
>> high quality with the TeX engine invisibly doing the work.
> Ok, so you are proposing converting XML to LaTeX and Epub/HTML on a
> backend system? The main problem with that is the conversion mechanisms
> just about always need human intervention and that it's hard to impossible
> to get XML input files from authors.

But conversion of XML to a fixed layout view is same as HTML to fixed
layout is it not, which is the aim of this group? Would that not need the
same human intervention? The human intervention is needed because
publishers want the same look as a journal they have had for decades. With
a little modification the process can be entirely automated.

>> Here are my points, distilled:
>>    - I like the idea of HTML/CSS/Javascript creating fixed pages to be
>>    read on screen with all kinds of interactivity
>>    - I still question trying to create footnotes, floating figures and
>>    tables, and typographic niceties which have primarily evolved for print on
>>    paper, being done in the browser. To me, floating items only apply to
>>    print, so no interactivity is not needed. Why not pass the info to an
>>    engine that knows how to do it well?
>> Is there not also a point in having footnotes and floating figures in
> ebooks (and have those still work when the user changes the font size
> level)?

Floats are a matter of opinion. I would say no, I don't want to flick to
the next page and back again. I want to hover or click and fig pops up.
Floats have been needed because of the obvious limitations of print. My
preference for footnotes is similar, i.e. click or touch screen to get more
info. But that can be a user's decision. We should have renderers that
produce whatever a user prefers.

>>    - The problem of floating items, complex math, large footnotes that
>>    need to break across pages, and many other complex pagination problems have
>>    already been solved in TeX. These are not trivial problems and I worry
>>    about this working group reinventing the wheel, by starting to specify the
>>    basics of pagination from scratch. In my opinion, in the end the only way
>>    to solve the problem is to rewrite TeX in JavaScript!
> I have also been thinking of LaTeX in Javascript. But as far as I can
> tell, that in 2015 that would still be too slow. TeXLive is a few GB large,
> and if the user should wait for a few GB to download before the page is
> rendered, that likely wouldn't work. In a few years, when a few GB is
> nothing and processors are faster, this may be a viable alternative.

You got it wrong here, Johannes. ;-) TeXLive contains every possible style
file you might want – 10,000s. The basic TeX compiler is only 500K!
Remember it is 35 years old, so had to run on mainframes, which is why it
is fast. Even with a few basic style files I don't think it would exceed
2Mb for an automated pagination system.

> As for these other items: I agree, they are quite complex. But it should
> be possible to do an "simple version" of most of those features that
> renders quickly, and then a more complex that will create even smoother
> designs for the situations when one has time.
>>    - Another problem I have is holding all our information in HTML as
>>    opposed to XML. I worry about how clean and semantic the content will be.
>>    after all HTML was designed to be forgiving, so even bad content will look
>>    good. We are all excited about the amazing gizmos in html and how the
>>    browser is the new publishing model, but what about 10, 50 or 100 years
>>    time? Will these html files still make sense? What happens when the browser
>>    is superseded? I am all for html tools and interactivity, but I suggest the
>>    definitive content should be XML, not HTML.
>> Good question. I would guess that it depends on what features. H1-H6 and
> P elements will likely still be readable for a long time. Also XMl files
> will be readable, but who can turn them into something visually attractive?
> Alreayd now there is a lack of a perfect WYSIWYG XML-editor and as we can
> see with XSL-FO, the future of turning XML into PDFs via a common standard
> is not secure either. And we are just a handful of years after "peak XML
> standardization" and likely still haven't reached "peak XML usage". So how
> about in 200 years?

The whole point of XML is that it is a description of content, not of form.
Once we worry about form or look, we lose site of future-proofing our

> The situation with LaTeX is somewhat similar (see below). Noone can quite
> know what will happen and which standard survives, but with HTML we at
> least know that the number of users is extremely high so that there is a
> certain chance that those files will survive for a good while. That being
> said, the safest is probably to store files in several formats for
> long-term storage. In Fidus Writer we therefore used both simple HTML and
> simple LaTeX as storage formats for user content.
Forgive me, but storing in several formats is absolutely the worst thing
you can do. What if there is a difference between the files? Which one is
right? Who knows? No one! Already there is a problem brewing. Go to any
open access journal (PeerJ, Plos, Frontiers etc) and pick a paper. You will
find the paper has a DOI – the definitive version of record. But which
*format* is the version of record, the XML, the HTML or the PDF. None of
the publishers have the courage to nominate one!! Of course they all know
it should be the XML but only the PDF has been proofread. No one looks at
XML – except me!!

I think that the excitement we are all experiencing with HTML (including
myself) is going to have bad consequences in future unless we set some
really firm rules. It is now the 350th anniversary of the first scholarly
journal. We can still read it with no ambiguity. So it has been amazingly
future-proof. 350 years from now, will scholars be able to read our
scientific literature without ambiguity? I doubt it!

>> Actually TeX is the fastest page renderer. Standard TeX files create
>> pages at over 100 pages a second on a normal laptop, including complex math
>> and footnotes. And I am surprised you had problem running old files. You
>> must have been using style files which had not been maintained. The TeX
>> engine has been frozen for 30 years!
> Yes, and that's why I thought LaTeX was a great idea som 15 years ago. But
> then I suddenly had to open some files from 1996 in 2003 created by someone
> else, and I spent about a week figuring out how to rewrite the macros and
> going through the files by hand fixing small things that had changed.
> SHortly thereafter I received a file that was just a few months old, but
> had been written on a Mac (I was running Linux) and the line endings were
> different so I had to figure out how to convert them.
> Then along came direct support for other characters without using
> shorthands, by using XeTeX and support for ttf fonts in LuaTeX, just in
> slightly different ways. And again lots of stuff needed to be changed by
> hand or by a script which I would spend up to a few days developing, etc. .
> Then suddenly the maintainer of the main bibliography package I had been
> using, biblatex, disappeared into thin air. Others eventually tracked him
> down and took over package maintainership.
> And even a few months ago, when I acquired a new laptop with a new version
> of Linux Mint, I couldn't just use my CV compiler[1] as I used to, because
> the version of TeXLive that is available in the package manager has some
> bugs that have been fixed upstream but haven't yet been fixed in the
> version available in the package maintainer, so I needed to add some extra
> lines of code I found on some random website.
> Different than support on HTML, which can easily be found in books and
> online documents, LaTeX hep can mostly be found in obscure places as the
> information about how to do one particular thing correctly at times only
> exists in the head of 2-3 developers worldwide.
> Of course it has always been possible to get the content and with some
> time spend on the internet in forums, it's always fixable in the end. For a
> big organization that can afford a development team of 5-6 people who can
> spend all their time on this, this is likely no big deal to pay for such
> conversions. But the question is of course if it will continue to be
> developed, or if not at some stage too many say "well, latex is really good
> looking, but HTML can do just about all of it, and it's a lot easier for me
> to understand how to modify it, so I'll stick with that". At least for now
> it looks like HTML has the advantage in numbers.

Kaveh Bazargan
River Valley Technologies
+44 7771 824 111
Received on Thursday, 6 August 2015 14:50:39 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:34:52 UTC