W3C home > Mailing lists > Public > public-digipub@w3.org > August 2015

Re: Prioritisation

From: Johannes Wilm <johanneswilm@vivliostyle.com>
Date: Thu, 6 Aug 2015 17:40:51 +0200
Message-ID: <CABkgm-TqFZzj2GXPEvYMkSAGQ6uQKN-5BvN=-jvHjAFs7KvBOg@mail.gmail.com>
To: Kaveh Bazargan <kaveh@rivervalleytechnologies.com>
Cc: Dave Cramer <dauwhe@gmail.com>, Richard Ishida <ishida@w3.org>, W3C Digital Publishing Discussion list <public-digipub@w3.org>
On Thu, Aug 6, 2015 at 4:49 PM, Kaveh Bazargan <
kaveh@rivervalleytechnologies.com> wrote:

> On 6 August 2015 at 14:58, Johannes Wilm <johanneswilm@vivliostyle.com>
> wrote:
>> On Thu, Aug 6, 2015 at 1:05 PM, Kaveh Bazargan <
>> kaveh@rivervalleytechnologies.com> wrote:
>>> Hi Johannes
>>> I am flattered by your comprehensive reply. My comments regarding TeX
>>> are below, but I might not have explained myself well...
>>> I am not suggesting anyone should use TeX code, or even be aware that
>>> TeX/LaTeX is involved. The point is that it is a back end automated page
>>> make up engine. So XML/HTML can be converted to PDF very fast and at very
>>> high quality with the TeX engine invisibly doing the work.
>> Ok, so you are proposing converting XML to LaTeX and Epub/HTML on a
>> backend system? The main problem with that is the conversion mechanisms
>> just about always need human intervention and that it's hard to impossible
>> to get XML input files from authors.
> But conversion of XML to a fixed layout view is same as HTML to fixed
> layout is it not, which is the aim of this group? Would that not need the
> same human intervention? The human intervention is needed because
> publishers want the same look as a journal they have had for decades. With
> a little modification the process can be entirely automated.

Yes, except that the HTML-to-PDF renderer present in browsers is used by
more than just us book people, which gives it slightly higher chances of
technical survival over the next few years.

>>> Here are my points, distilled:
>>>    - I like the idea of HTML/CSS/Javascript creating fixed pages to be
>>>    read on screen with all kinds of interactivity
>>>    - I still question trying to create footnotes, floating figures and
>>>    tables, and typographic niceties which have primarily evolved for print on
>>>    paper, being done in the browser. To me, floating items only apply to
>>>    print, so no interactivity is not needed. Why not pass the info to an
>>>    engine that knows how to do it well?
>>> Is there not also a point in having footnotes and floating figures in
>> ebooks (and have those still work when the user changes the font size
>> level)?
> Floats are a matter of opinion. I would say no, I don't want to flick to
> the next page and back again. I want to hover or click and fig pops up.
> Floats have been needed because of the obvious limitations of print. My
> preference for footnotes is similar, i.e. click or touch screen to get more
> info. But that can be a user's decision. We should have renderers that
> produce whatever a user prefers.

Agreed, this should be user preference. I think on scientific ebooks, for
example, I would still like real footnotes at the bottom and I think the
lack of good footnote support is why many still use PDFs instead of epubs
for certain types of texts.

>>>    - The problem of floating items, complex math, large footnotes that
>>>    need to break across pages, and many other complex pagination problems have
>>>    already been solved in TeX. These are not trivial problems and I worry
>>>    about this working group reinventing the wheel, by starting to specify the
>>>    basics of pagination from scratch. In my opinion, in the end the only way
>>>    to solve the problem is to rewrite TeX in JavaScript!
>> I have also been thinking of LaTeX in Javascript. But as far as I can
>> tell, that in 2015 that would still be too slow. TeXLive is a few GB large,
>> and if the user should wait for a few GB to download before the page is
>> rendered, that likely wouldn't work. In a few years, when a few GB is
>> nothing and processors are faster, this may be a viable alternative.
> You got it wrong here, Johannes. ;-) TeXLive contains every possible style
> file you might want – 10,000s. The basic TeX compiler is only 500K!
> Remember it is 35 years old, so had to run on mainframes, which is why it
> is fast. Even with a few basic style files I don't think it would exceed
> 2Mb for an automated pagination system.

Right. But you will likely need some of those packages, if you will want to
let people render their documents.

The use case I had was that I had people who were to write scientific
articles and I was thinking of how I could get them to render the documents
themselves, without having to go through the process of installing LaTeX
and doing weird things on the command line. So in that case I couldn't know
exactly what packages my users would need.

So maybe this is indeed possible if one locks everyone down to a small
subset of everything. I just haven't seen that in action ever. Linux always
seems to ask me to just install another 600 MB of files whenever I try to
compile the file of someone else.

> Forgive me, but storing in several formats is absolutely the worst thing
> you can do. What if there is a difference between the files? Which one is
> right? Who knows? No one! Already there is a problem brewing. Go to any
> open access journal (PeerJ, Plos, Frontiers etc) and pick a paper. You will
> find the paper has a DOI – the definitive version of record. But which
> *format* is the version of record, the XML, the HTML or the PDF. None of
> the publishers have the courage to nominate one!! Of course they all know
> it should be the XML but only the PDF has been proofread. No one looks at
> XML – except me!!

I know I know. Among software developers we have similar sayings when it
comes to data.

And still: If you have the same program saved in 1996 both on a CD-ROM and
on a 3.5" disc, you can read both and they differ, you may have a problem
of figuring out which one is "better". But if you only save to one of them,
you may end up not being able to access it at all. So it ends up being
wiser to make multiple copies in different formats anyway, even if it may
put you in that dilemma.

I assume that the same reason people not only maintain digital copies, but
also paper copies (on special paper) that are then stored on different

> I think that the excitement we are all experiencing with HTML (including
> myself) is going to have bad consequences in future unless we set some
> really firm rules. It is now the 350th anniversary of the first scholarly
> journal. We can still read it with no ambiguity. So it has been amazingly
> future-proof. 350 years from now, will scholars be able to read our
> scientific literature without ambiguity? I doubt it!

It's not impossible. But it's not just HTML. It's everything about our
times that requires fast changes which means that none of us can be sure we
will even be able to read this conversation in 5 years time.

> --
> Kaveh Bazargan
> Director
> River Valley Technologies
> @kaveh1000
> +44 7771 824 111
> www.rivervalleytechnologies.com
> www.bazargan.org
Received on Thursday, 6 August 2015 15:41:26 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:34:52 UTC