Re: [Minutes] EPUB Virtual Locator TF, 2021-06-02 from Hadrien Gardeur on 2021-06-04 (public-epub-wg@w3.org from June 2021)

From: Hadrien Gardeur <hadrien@demarque.com>
Date: Fri, 4 Jun 2021 11:39:54 +0200
To: Dan Lazin <dlazin@google.com>
Cc: Laurent Le Meur <laurent.lemeur@edrlab.org>, "Reid, Wendy" <wendy.reid@rakuten.com>, W3C EPUB 3 Working Group <public-epub-wg@w3.org>
Message-ID: <CAHqp8zh0cXjd6YwO3JW_k+kS23Z3sS2BgF1yYB1nxNd44Rok9Q@mail.gmail.com>
Hello Dan,

Regarding the vocabulary, I completely agree that "pages" shouldn't be used
outside the context of print pages, as defined in the page list.

In the Readium community, we've been discussing the right terminology for
over three years now and although we've been using "positions" for what
this group refers to as a "virtual locator", there's a regular pushback
from external developers. I believe that the main issue is that end-users
are not familiar with any other term than page.

Regarding the counting algorithm, we've also had back and forth discussions
over this for years within Readium. Since a lot of implementers in our
community previously relied on RMSDK for EPUB rendering, we've mostly
aligned with what Adobe has been using since 2007: 1024 bytes per position.
Even with such a straightforward approach, it still raises a number of
questions when we implement it:

   - Should you calculate this based on the compressed or deflated size of
   the resource?
   - What about encrypted resources?
   - From the context of a webview, how can you calculate the equivalent of
   this progression in bytes?

Might I suggest that it sounds like you would enjoy a few task force
> meetings? :)
>

I would also be interested in participating in these calls but this has
been very difficult to achieve:

   - a number of these TF calls are in the middle of the night for people
   living in Europe
   - there are too many different TF calls in a given week

I know that Laurent and other key members of the Readium community (Daniel
for example) are in the same situation.

We've been trying to keep up with this group mostly through the meeting
notes, but we'd love to figure out an easier way to interact.

Among other things, we've been worried to see so many mentions about CFI. I
personally think that this ship has sailed and that we should instead align
with the Web and the work done for example on text fragments (other
Web-based solutions have followed that approach, for example Hypothesis).

While CFI/XPath and other tree-based approaches have the benefit of
pointing very precisely in a document, they're also extremely fragile and
can be expensive to compute in many scenarios.

We've favored instead:

   - URLs (instead of an index in the spine)
   - text (including surrounding text)
   - and media-specific fragments

Best,
Hadrien


Le jeu. 3 juin 2021 à 23:51, Dan Lazin <dlazin@google.com> a écrit :

> +Wendy for visibility
>
> We haven't gotten that far yet, but my impression of the direction we're
> heading in is that (perhaps) reading systems would continue to use their
> existing counting algorithms for the time being, but we might suggest that
> the results be renamed — for example, "screens" instead of "pages." As an
> example, Apple Books already distinguishes between pages and screens; in a
> book that has a page-list, you can tap to switch between screen counts
> (which recalculates upon reflow) and page counts (which doesn't).
>
> Might I suggest that it sounds like you would enjoy a few task force
> meetings? :)
>
>
>
> On Jun 3, 2021, at 12:50 PM, Laurent Le Meur <laurent.lemeur@edrlab.org>
> wrote:
>
> Ok then we (Readium developers) can help. The next question is: do we
> agree that reading systems which are well known on the market but will not
> change their algorithm (because they are legacy, because it would be a
> breaking change for them ...) may not support this new standard, but ...
> well this is life?
>
> L
>
> Le 3 juin 2021 à 17:37, Dan Lazin <dlazin@google.com> a écrit :
>
> Hey, Laurent. We are indeed talking (talking) about standardizing the
> algorithm. The short version is "use page-list if present, and if not do
> something dead-simple like divide by 1000."
>
> We're still pretty far from writing a spec, but we are talking about
> standardization here.
>
>
> On Jun 3, 2021, at 9:55 AM, Laurent Le Meur <laurent.lemeur@edrlab.org>
> wrote:
>
> Hi everybody,
>
> Sorry for not having been able to participate to the call.
>
> *About use case line 2* ("A teacher wants to ask students to go to a
> certain location in an EPUB which contains no explicit page-list. The
> students are using different types of reading systems, nevertheless all
> reach the same page. ")
>
> We currently are working in this area in the Readium Developers'
> community. I don't want to be pessimistic but I believe this will not
> happen. If page lists are present, ok the mechanism is documented and it is
> not about virtual locators, but UX and the ability to jump to a location
> identified by an html fragment id. But if no page lists are present, each
> reading system has its recipe to calculate "positions" (as we call it at
> Readium) aka virtual page numbers. "positions" are calculated per resource
> first, then agglomerated to form a sequence. For instance it may be the
> size of the compressed file (in the zip) divided by 1024. Or the size of
> the (decompressed) html content divided by 2500. Either this group wants to
> standardize the algorithm, or the use case is IMHO void.
>
> Best regards
> Laurent
>
>
> Le 3 juin 2021 à 15:23, Ivan Herman <ivan@w3.org> a écrit :
>
> Minutes are here:
>
>
> https://www.w3.org/publishing/groups/epub-wg/Meetings/Minutes/2021-06-02-epub-locators
>
> Ivan
>
> ----
> Ivan Herman, W3C
> Home: http://www.w3.org/People/Ivan/
> mobile: +33 6 52 46 00 43
> ORCID ID: https://orcid.org/0000-0003-0782-2704
>
>
>
>
>
>
Received on Friday, 4 June 2021 09:41:51 UTC