W3C home > Mailing lists > Public > public-digipub-ig@w3.org > January 2016

Re: Proposal: PDF alternative using HTML (ZIP/GZIP)

From: Craig Francis <craig.francis@gmail.com>
Date: Thu, 28 Jan 2016 22:47:54 +0000
Cc: Bill McCoy <bmccoy@idpf.org>, Leonard Rosenthol <lrosenth@adobe.com>, W3C Digital Publishing IG <public-digipub-ig@w3.org>, Ivan Herman <ivan@w3.org>, Nick Ruffilo <nickruffilo@gmail.com>
Message-Id: <07F792A2-7DA6-4F79-8F5B-2767EE9E06F0@gmail.com>
To: Deborah Kaplan <dkaplan@safaribooksonline.com>
Hi Deborah,

Thank you for your feedback,

I should point out that I also don't want to create new formats if we can avoid it:

https://xkcd.com/927/ <https://xkcd.com/927/>

However, none of the formats mentioned so far really seem to fit that well with the example documents I've mentioned.

Well, PDF kind of does, but they have their own problems, which is why I'm proposing a HTML solution that is similar to EPUB.

I've just summarised these points at:

https://github.com/craigfrancis/wdoc/ <https://github.com/craigfrancis/wdoc/>

The reason I would still like to propose this new format is because it's basically re-using lots of existing standards, so it's really not much of a new format/standard.

I'm also really glad to see that you take accessibility seriously.

This is one of the three main points I really want to address (along with security, and how easy it is to create the documents).

HTML is one of the most accessible formats available (well, excluding some weird things that developers do with it), and this is why I've been interested in EPUB/PWP, but I do not think they work well for documents like:

- Invoices.
- Terms and conditions.
- Contracts.
- Reports, assessments, statistics.
- Bank statements.

Mostly because the PWP would allow these documents to change, and EPUB is currently being used with e-readers, where their use is a little different (e.g. storing the documents in a library by default, requiring a Table of Contents, etc).


> On 28 Jan 2016, at 19:51, Deborah Kaplan <dkaplan@safaribooksonline.com> wrote:
> Bill noticed I'd replied only to him! Resending to the list.
> 'm slightly changing around the order of which of Bill's points I am responding to, to put the most important ones at the top.
> On Wed, Jan 27, 2016 at 2:44 PM, Bill McCoy <bmccoy@idpf.org <mailto:bmccoy@idpf.org>> wrote:
> <snip> 
> And most of all I don't think we should even consider forking yet another effort on something different. We already have a "PDF alternative using HTML (ZIP/GZIP)", it's called EPUB, it's already widely utilized and is expanding into new segments of content publishing, and with the PWP work we're hopefully going to take that alternative much further towards full convergence with OWP.
> Yes, definitely! If an existing spec is inadequate, see if the existing spec can be modified. We don''t need more specs if they can be avoided.
> <snip>
> This makes PDF inherently not mobile-ready (in terms of adjustment of content to different sized screens), not very accessible, and not very semantically intelligible in various machine-processing workflows. Computers can drive cars so clearly they can reconstruct text and structure from visual information, but it's a heuristic process. As Leonard indicates  it's possible in theory to create accessible PDFs but since the logical structure features were grafted onto PDF's sequence-of-page-images architecture years after the fact the result is pretty awkward which is one reason that most PDF creation tools (including many from Adobe) don't even attempt it at all, much less to the level needed to meet WCAG 2.0 standards (it is nearly impossible to fund PDF content that is actually conformant to the PDF/UA profile ).
> <snip> 
> But I do think we need to tease apart the key attributes and not conflate "reliable" with "packaged" with "fixed-layout". Portable Web Publications need to support all of these attributes even though individual instances may choose which ones they fully deliver on. I would, with hesitation, even add "accessible" to this list of separable attributes.
> Yes, this, absolutely, except I do not hesitate over "accessible". Reliable, packaged, and fixed layout are three different qualities. A document can have one of them, or it can have all three of them. I would restate the "accessibility" attribute as "something can be reliable, packaged, and fixed layout without being accessible at all, but it shouldn't be. (Technically, fixed-layout and accessibility can be conflicting constraints; if a document is unreadable to a sighted user without three columns, should it also be unreadble to someone using a screen reader? And if not, why are the columns so important that sighted users are required to use them? But that's a digression.)
> Fixed-layout would not be such a problem for _everyone_ if accessibility were built in as a baseline spec. As an example, for all PDF's very real utility, people would not have such strong negative feelings about it if it were possible to reflow on a portrait display such as a laptop screen, or a small display such as a mobile device. If accessibility were part of the minimal baseline, then tools could assume reflow would reliable work, and that print-formatted brochure could be equally readable on your apple watch.
> I would like all content in the next-generation portable document format to be accessible, but as a broad-based part of OWP it's not clear that this is realistic to set as  a baseline requirement (hence one thing IPDF is considering in conjunction with our EPUB 3.1 revision is separating accessibility requirements into a layered profile, separate from the base specification).
> Accessibility is always the requirement that drops off the bottom. We have an opportunity to make that harder for people to do. If a validator fails on accessibility check failure (at least for the machine readable components of accessibility; obviously whether alt text is useful can't be automated, while its mere presence can be), then it's not valid.
> Deborah Kaplan

Received on Thursday, 28 January 2016 22:48:39 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:36:22 UTC