Re: Prioritisation from Bill McCoy on 2015-08-06 (public-digipub@w3.org from August 2015)

From: Bill McCoy <whmccoy@gmail.com>
Date: Thu, 6 Aug 2015 09:35:21 -0700
To: Leonard Rosenthol <lrosenth@adobe.com>
Cc: Johannes Wilm <johanneswilm@vivliostyle.com>, Kaveh Bazargan <kaveh@rivervalleytechnologies.com>, Dave Cramer <dauwhe@gmail.com>, Richard Ishida <ishida@w3.org>, W3C Digital Publishing Discussion list <public-digipub@w3.org>
Message-ID: <CAJ0DDbA6CHEDxj-WsVV=erFPX3iTeZXW9n-GwZ=Ft3-ZQeLYfw@mail.gmail.com>
I agree with Leonard about the deficiencies in today's browser printing
pipelines but wanted to add a couple things:

- EPUB print-on-demand solutions are starting to appear, I saw one being
offered for the Japanese market last month at Tokyo International Book
Fair. There has been interest expressed in an initiative on this by folks
from major print/POD players (HP, Toppan, Dai Nippon Printing, Ingram) as
well as the accessibility community. No active project is yet under way in
IDPF/Readium but I anticipate there may be something soon and would love to
have this coordinated with this group's activities. Today's prevalent
solutions in the CSS formatting space (AntennaHouse and Prince) are
proprietary so having something open source, built on a browser engine, and
helping to advance the relevant open standards (esp. CSS) would seem
helpful. I don't think this necessarily needs to wait for other parts of
the EPUB-WEB vision to be realized so it could be a good candidate for
near-term efforts that would yield rapid useful results.

- I think it would make most sense for this group to focus on OWP->PDF that
is good for high-quality printing... I believe the "interactive PDF" has
passed it's sell-by date and fundamentally trying to map HTML forms into
PDF forms, HTML JS APIs into Acrobat scripting APIs etc. seems both
unlikely to be fruitful and unlikely to be very useful since we already
have EPUB and are moving in the direction of EPUB-WEB.  There are some
areas of functionality missing from EPUB that are present in PDF that could
be helpful to OWP in general and thus in scope for EPUB-WEB and thus this
group. For example, digital signatures (in the legal sense). Working on
filling these gaps seems better to me than worrying about PDF as anything
else than its design center and primary use case as a replica of paper.

--Bill



On Thu, Aug 6, 2015 at 8:52 AM, Leonard Rosenthol <lrosenth@adobe.com>
wrote:

> >Yes, except that the HTML-to-PDF renderer present in browsers is used by
> more than just us book people,
> >which gives it slightly higher chances of technical survival over the
> next few years.
> >
> Unfortunately, those converters produce output that is useless for
> anything other than printing (and in some cases, not even that).   All
> sense of semantics and non-static content have been lost and pagination is
> arbitrary and uncontrollable (which is what started this thread, IIRC).
>  it’s also quite unclear when the process should proceed – when are scripts
> “done” and the content is “ready”.   We recently undertook a detailed
> product/technology comparison in this area, so this is not speculation but
> fact.
>
> This (OWP->PDF) is one of the areas that has brought us back into active
> participation as our customers and the industry as a whole is being damaged
> by the lack of standardization (or even implementation!) in this area.
> There are a lot of great (draft!) specs out there that could potentially
> resolve some of these issues, but it will require a group (such as this
> one) to pick the one(s) that we feel are solving the correct problems,
> validate that the concerns of all constituents (not just book and magazine
> publishers) are met, and work to see them implemented in the key UA
> technologies.  Not a “quick fix” problem – but the sooner we start, the
> sooner it will be resolved.
>
> Leonard
>
> From: Johannes Wilm
> Date: Thursday, August 6, 2015 at 11:40 AM
> To: Kaveh Bazargan
> Cc: Dave Cramer, Richard Ishida, W3C Digital Publishing Discussion list
> Subject: Re: Prioritisation
> Resent-From: <public-digipub@w3.org>
> Resent-Date: Thursday, August 6, 2015 at 11:41 AM
>
>
>
> On Thu, Aug 6, 2015 at 4:49 PM, Kaveh Bazargan <
> kaveh@rivervalleytechnologies.com> wrote:
>
>>
>>
>> On 6 August 2015 at 14:58, Johannes Wilm <johanneswilm@vivliostyle.com>
>> wrote:
>>
>>>
>>>
>>> On Thu, Aug 6, 2015 at 1:05 PM, Kaveh Bazargan <
>>> kaveh@rivervalleytechnologies.com> wrote:
>>>
>>>> Hi Johannes
>>>>
>>>> I am flattered by your comprehensive reply. My comments regarding TeX
>>>> are below, but I might not have explained myself well...
>>>>
>>>> I am not suggesting anyone should use TeX code, or even be aware that
>>>> TeX/LaTeX is involved. The point is that it is a back end automated page
>>>> make up engine. So XML/HTML can be converted to PDF very fast and at very
>>>> high quality with the TeX engine invisibly doing the work.
>>>>
>>>
>>> Ok, so you are proposing converting XML to LaTeX and Epub/HTML on a
>>> backend system? The main problem with that is the conversion mechanisms
>>> just about always need human intervention and that it's hard to impossible
>>> to get XML input files from authors.
>>>
>>
>> But conversion of XML to a fixed layout view is same as HTML to fixed
>> layout is it not, which is the aim of this group? Would that not need the
>> same human intervention? The human intervention is needed because
>> publishers want the same look as a journal they have had for decades. With
>> a little modification the process can be entirely automated.
>>
>
> Yes, except that the HTML-to-PDF renderer present in browsers is used by
> more than just us book people, which gives it slightly higher chances of
> technical survival over the next few years.
>
>
>
>>
>>
>>>
>>>
>>>>
>>>> Here are my points, distilled:
>>>>
>>>>    - I like the idea of HTML/CSS/Javascript creating fixed pages to be
>>>>    read on screen with all kinds of interactivity
>>>>    - I still question trying to create footnotes, floating figures and
>>>>    tables, and typographic niceties which have primarily evolved for print on
>>>>    paper, being done in the browser. To me, floating items only apply to
>>>>    print, so no interactivity is not needed. Why not pass the info to an
>>>>    engine that knows how to do it well?
>>>>
>>>> Is there not also a point in having footnotes and floating figures in
>>> ebooks (and have those still work when the user changes the font size
>>> level)?
>>>
>>
>> Floats are a matter of opinion. I would say no, I don't want to flick to
>> the next page and back again. I want to hover or click and fig pops up.
>> Floats have been needed because of the obvious limitations of print. My
>> preference for footnotes is similar, i.e. click or touch screen to get more
>> info. But that can be a user's decision. We should have renderers that
>> produce whatever a user prefers.
>>
>
> Agreed, this should be user preference. I think on scientific ebooks, for
> example, I would still like real footnotes at the bottom and I think the
> lack of good footnote support is why many still use PDFs instead of epubs
> for certain types of texts.
>
>
>>
>>>>    - The problem of floating items, complex math, large footnotes that
>>>>    need to break across pages, and many other complex pagination problems have
>>>>    already been solved in TeX. These are not trivial problems and I worry
>>>>    about this working group reinventing the wheel, by starting to specify the
>>>>    basics of pagination from scratch. In my opinion, in the end the only way
>>>>    to solve the problem is to rewrite TeX in JavaScript!
>>>>
>>>>
>>> I have also been thinking of LaTeX in Javascript. But as far as I can
>>> tell, that in 2015 that would still be too slow. TeXLive is a few GB large,
>>> and if the user should wait for a few GB to download before the page is
>>> rendered, that likely wouldn't work. In a few years, when a few GB is
>>> nothing and processors are faster, this may be a viable alternative.
>>>
>>
>> You got it wrong here, Johannes. ;-) TeXLive contains every possible
>> style file you might want – 10,000s. The basic TeX compiler is only 500K!
>> Remember it is 35 years old, so had to run on mainframes, which is why it
>> is fast. Even with a few basic style files I don't think it would exceed
>> 2Mb for an automated pagination system.
>>
>
> Right. But you will likely need some of those packages, if you will want
> to let people render their documents.
>
> The use case I had was that I had people who were to write scientific
> articles and I was thinking of how I could get them to render the documents
> themselves, without having to go through the process of installing LaTeX
> and doing weird things on the command line. So in that case I couldn't know
> exactly what packages my users would need.
>
> So maybe this is indeed possible if one locks everyone down to a small
> subset of everything. I just haven't seen that in action ever. Linux always
> seems to ask me to just install another 600 MB of files whenever I try to
> compile the file of someone else.
>
>
>
>> Forgive me, but storing in several formats is absolutely the worst thing
>> you can do. What if there is a difference between the files? Which one is
>> right? Who knows? No one! Already there is a problem brewing. Go to any
>> open access journal (PeerJ, Plos, Frontiers etc) and pick a paper. You will
>> find the paper has a DOI – the definitive version of record. But which
>> *format* is the version of record, the XML, the HTML or the PDF. None of
>> the publishers have the courage to nominate one!! Of course they all know
>> it should be the XML but only the PDF has been proofread. No one looks at
>> XML – except me!!
>>
>
> I know I know. Among software developers we have similar sayings when it
> comes to data.
>
> And still: If you have the same program saved in 1996 both on a CD-ROM and
> on a 3.5" disc, you can read both and they differ, you may have a problem
> of figuring out which one is "better". But if you only save to one of them,
> you may end up not being able to access it at all. So it ends up being
> wiser to make multiple copies in different formats anyway, even if it may
> put you in that dilemma.
>
> I assume that the same reason people not only maintain digital copies, but
> also paper copies (on special paper) that are then stored on different
> continents.
>
>
>>
>> I think that the excitement we are all experiencing with HTML (including
>> myself) is going to have bad consequences in future unless we set some
>> really firm rules. It is now the 350th anniversary of the first scholarly
>> journal. We can still read it with no ambiguity. So it has been amazingly
>> future-proof. 350 years from now, will scholars be able to read our
>> scientific literature without ambiguity? I doubt it!
>>
>
> It's not impossible. But it's not just HTML. It's everything about our
> times that requires fast changes which means that none of us can be sure we
> will even be able to read this conversation in 5 years time.
>
>
>
>> --
>> Kaveh Bazargan
>> Director
>> River Valley Technologies
>> @kaveh1000
>> +44 7771 824 111
>> www.rivervalleytechnologies.com
>> www.bazargan.org
>>
>
>
Received on Thursday, 6 August 2015 16:35:51 UTC