Re: Prioritisation from Johannes Wilm on 2015-08-05 (public-digipub@w3.org from August 2015)

From: Johannes Wilm <johanneswilm@vivliostyle.com>
Date: Wed, 5 Aug 2015 19:54:50 +0200
To: Nick Ruffilo <nickruffilo@gmail.com>
Cc: Deborah Kaplan <dkaplan@safaribooksonline.com>, Bill Kasdorf <bkasdorf@apexcovantage.com>, Leonard Rosenthol <lrosenth@adobe.com>, Ivan Herman <ivan@w3.org>, Kaveh Bazargan <kaveh@rivervalleytechnologies.com>, Dave Cramer <dauwhe@gmail.com>, W3C Digital Publishing Discussion list <public-digipub@w3.org>, Matthew Hardy <mahardy@adobe.com>
Message-ID: <CABkgm-RvR_1eyrFRaARkg3LMYNqsS8c+9+0+jczdOQY-gD9Kew@mail.gmail.com>
The connection between CSS and pages as markers of distinct content is also
whether CSS can be specific enough to ensure that the same content will
land on the same page, if the pages are created on the end user device. If
this is the case, there is no need for added meta data information about
what content data belongs to what page. This information should then even
be extractable by end user devices that use a different division of pages
on the end display.

Also, if web technology is used for the printed page as well (as we do with
Vivliostyle), it shouldn't be very difficult to replicate the printed page
exactly on screen, so that the differences between some page concepts may
blur a bit.

That being said, I am not sure it relates 100% to this prioritization
document. Whether or not this is possible depends not so much on the
implementation of a specific spec, but instead on how similar different
rendering engines implement the various specifications. 0.1 point
difference in the calculation of margins due to some difference in rounding
of one specific letter, could mean that pne word is being pushed over to
another page in one renderer but not another, and then all such information
is unreliable.





On Wed, Aug 5, 2015 at 6:51 PM, Nick Ruffilo <nickruffilo@gmail.com> wrote:

> Deborah,
>
> If the discussion assumes PAGE is a high-level concept - then you are
> correct, but I believe - and how I framed my notes (and may have failed) is
> that a PAGE is actually just a media-query - it's a defined set of display
> styles that are applied to structured content based on the form in which it
> is being displayed.
>
> Under that assumption - then it is completely relevant to CSS.  The
> "printed page" is then just a media-query abstraction.  What becomes the
> relevant CSS question is: What abstractions are needed for the structured
> content?
>
> What interactions and states are needed?  what regions are "static"
> (header/footer) vs non-static.  Obviously a reading system could take
> advantage of "position: absolute/fixed;" for a header/footer element, but
> how would that interact with scrolled or flipped chunked content?  regions?
> etc.
>
> I think this is all EXTREMELY relevant to CSS because the page is a visual
> representation.  Unless I'm mistaken, CSS is the visual representation of
> the structured HTML content, no?
>
> -Nick
>
> On Wed, Aug 5, 2015 at 12:13 PM, Deborah Kaplan <
> dkaplan@safaribooksonline.com> wrote:
>
>> This discussion is all quite interesting, and a good one to have, but I
>> just want to clarify -- the way pages and references are being discussed
>> here are more relevant to the meaning of "pages" which is not part of the
>> CSS prioritization document under the original discussion. Unless I am
>> misunderstanding again! In which case maybe we should change the subject
>> line. :-)
>>
>> On Wed, Aug 5, 2015 at 11:21 AM, Bill Kasdorf <bkasdorf@apexcovantage.com
>> 232 <bkasdorf@apexcovantage.com>> wrote:
>>
>>> All excellent points, Nick—thanks!
>>>
>>>
>>>
>>> I particularly like the "scrolling" vs. "flipping" distinction. I will
>>> steal that!
>>>
>>>
>>>
>>> --Bill
>>>
>>>
>>>
>>> *From:* Nick Ruffilo [mailto:nickruffilo@gmail.com233
>>> <nickruffilo@gmail.com>]
>>> *Sent:* Wednesday, August 05, 2015 11:11 AM
>>> *To:* Bill Kasdorf
>>> *Cc:* Leonard Rosenthol; Ivan Herman; Kaveh Bazargan; Johannes Wilm;
>>> Dave Cramer; W3C Digital Publishing Discussion list; Matthew Hardy
>>> *Subject:* Re: Prioritisation
>>>
>>>
>>>
>>> Sorry for the late comments, I struggled to find time to sit down and
>>> read through everyone's great discussion.  Since there are a lot of great
>>> gems here, I wanted to mainly summarize, but also offer some of my own
>>> commentary.
>>>
>>>
>>>
>>> *Pages/Pagination*
>>>
>>> Some of the ways that the "page" concept has been approached is a
>>> simulacrum to the physical book - but I think the abstract goes a whole
>>> step farther.  Keep in mind that in the trade world, you can have a
>>> paperback and hardcover of the SAME content that have different "page"
>>> breakdowns.  Long before CSS, publishers were making use of media-queries
>>> to structure content.  The page is the way in which content is optimally
>>> laid out for the medium on which it is being displayed.
>>>
>>>
>>>
>>> We could even call the "page" the Display-style of structured content
>>> for a given medium.  From this perspective, we account for some content
>>> being "scrolling" and some being "flipped" (what some call paginated - but
>>> I'm trying to avoid using the overloaded word).  Not only are device SIZES
>>> taken into account, but also device capabilities.  Scrolling on a non-touch
>>> non-mouse device might be unreasonable.  And - in the future when there are
>>> new input devices, it would be useful to have a spec that took that into
>>> account.
>>>
>>>
>>>
>>> *Content Placement Markers*
>>>
>>> There is a clear need - at the moment - for markers denoting physical
>>> pages for the needs of accessibility - but as noted in the wonderful
>>> comments here, what we really need is just a way to point to a specific
>>> point in the text.  As long as we have a good location identifier of sorts,
>>> then we can solve this problem and it becomes an issue of authors/authoring
>>> tools to add these links.
>>>
>>>
>>>
>>> *Fixed Layout*
>>>
>>> Given the above definition of the Page/Pagination - fixed layout becomes
>>> just another render option.  Take a cookbook.  A publisher may decide to
>>> have a "recipe" template that shows a picture on the left, ingredients on
>>> the right, and a description below - all absolutely positioned on the
>>> canvas.  They could define this layout for devices that are 1024x768 pixels
>>> or more.  But, they could also define a layout for small devices that are
>>> reflowable - show an image, ingredients below, then instructions.  To have
>>> an entirely different package or content form for fixed layout is
>>> completely absurd (and I should know, as I've created an authoring tool
>>> solely FOR fixed-layout content).
>>>
>>>
>>>
>>> *Thinking abstract about structured content*
>>>
>>> In the ideal - and I do believe many publishers think about things like
>>> this - is to think about things in terms of content.  The display of the
>>> content is part of productization, but it is different for different
>>> mediums (hardcover, paperback, ebook, webook, etc.)  I'm not sure
>>> publishers are masters-of-the-web and know exactly what is best for them,
>>> and that is where we have the opportunity to - given our knowledge of the
>>> web and CSS - provide a guideline for how content should be structured to
>>> maximize layout on multiple devices while not losing accessibility.  I
>>> believe this CAN be achieved.
>>>
>>>
>>>
>>> I hope this makes sense - I was up late last night....
>>>
>>>
>>>
>>> -Nick
>>>
>>>
>>>
>>> On Wed, Aug 5, 2015 at 10:52 AM, Bill Kasdorf <
>>> bkasdorf@apexcovantage.com234 <bkasdorf@apexcovantage.com>> wrote:
>>>
>>> Good point! Thanks for the reminder; that's an aspect I often forget to
>>> mention.
>>>
>>> One related comment: the motivation between those author-driven page
>>> breaks is usually relationship/association/containment-driven: "keep this
>>> with that," "the reader/user/student needs to see this, that, and this
>>> other thing at the same time," etc.
>>>
>>> JATS/BITS, the dominant XML model in the scholarly publishing world, has
>>> a pair of <milestone> elements for precisely that purpose. Every
>>> <milestone> has a @rationale attribute for saying what it's for, and there
>>> are actually two such elements, <milestone-start/> and <milestone-end/>,
>>> empty marker elements that enable getting around the well-formedness
>>> barrier often encountered in this context, where the starting and ending
>>> points are at arbitrary locations that don't necessarily align with the
>>> logical structure of the document. Very useful.
>>>
>>> Thanks again for mentioning the author-driven breaks!
>>>
>>> --Bill K
>>>
>>>
>>> -----Original Message-----
>>> From: Leonard Rosenthol [mailto:lrosenth@adobe.com235
>>> <lrosenth@adobe.com>]
>>> Sent: Wednesday, August 05, 2015 10:15 AM
>>> To: Bill Kasdorf; Ivan Herman; Kaveh Bazargan
>>> Cc: Johannes Wilm; Dave Cramer; W3C Digital Publishing Discussion list;
>>> Matthew Hardy
>>> Subject: Re: Prioritisation
>>>
>>> Bill, as with various times in this thread, I completely agree with you
>>> concerning references into content.
>>>
>>> However, I think there is a part of this that is being overlooked and is
>>> quite important.  Author-forced page breaks due to (usually but not always)
>>> changes in some semantic element.  In a book, this would be chapter breaks
>>> (so a new chapter starts at the top of a new page), but in a magazine it
>>> could be an article, or before tables in a scientific paper or …
>>>  Fortunately, there is work being done in this area in CSS.  Unfortunately,
>>> there is zero support amongst the various UA’s to support it.
>>>
>>> Leonard
>>>
>>>
>>>
>>>
>>>
>>> On 8/5/15, 10:00 AM, "Bill Kasdorf" <bkasdorf@apexcovantage.com236
>>> <bkasdorf@apexcovantage.com>> wrote:
>>>
>>> >To be clear, since I've long been an advocate of page markers: I view
>>> them as necessary _at the present time_ but a blunt and arbitrary
>>> instrument.
>>> >
>>> >I would even go so far (perhaps surprising, coming from me) to call
>>> them brain-dead. The reason is that they have absolutely no relationship to
>>> the inherent structure and content of the document. They just mark where
>>> pages happened to break, in a particular rendering, which is based on a
>>> designer having decided on a certain complement of specifications (page
>>> size, margins, font size, leading, spacing, etc.) that resulted in a
>>> certain amount of content fitting on a given page, having started at an
>>> arbitrary point based on what happened to fit on the previous page. That's
>>> all they are. But we need the dumb things.
>>> >
>>> >Nobody, certainly not me, would argue that this is the best way to
>>> embed reference points in a document. In fact the activity that I am
>>> nominally in charge of on the W3C DPUB IG (nominally because I completely
>>> depend on Ivan for the heavy lifting brain work and knowledge base)
>>> involves looking at how to identify fragments within a document, _without_
>>> having to embed reference points at all. (Sidenote: this fragment
>>> identification actually requires two locations, either explicit or implied.
>>> Mostly implied, because if you can reference the actual structure of the
>>> document, then what you are really doing is pointing at the starting
>>> location of the fragment, the ending location of which is typically
>>> provided by markup, e.g. the end of a <section> or <p> or <span>. But you
>>> also have to be able to define fragments that aren't neatly aligned with
>>> the document structure--either because they're smaller than the markup
>>> delineates or because they overlap the structural components of the
>>> document. That's where the fun starts.)
>>> >
>>> >But I digress. The point is that OF COURSE it is better for the
>>> reference points in the document to be based, at least as a foundation, on
>>> the document structure, down to the paragraph level and any phrase-level
>>> markup that paragraphs might contain. Duh!
>>> >
>>> >The reality of this particular time in history, however, is that there
>>> are still too many situations where the reference to the authoritative
>>> paginated version is still relied upon (back-of-the-book indexes, scholarly
>>> citations, cross references, teachers saying "turn to page 53," etc.). So
>>> we are stuck with our brain-dead page markers. Would we like something
>>> better? You bet! We're working on that. But in the meantime we still have
>>> to have those damn page break markers. It's called the real world.
>>> >
>>> >--Bill K
>>> >
>>> >-----Original Message-----
>>> >From: Ivan Herman [mailto:ivan@w3.org237 <ivan@w3.org>]
>>> >Sent: Wednesday, August 05, 2015 5:58 AM
>>> >To: Kaveh Bazargan
>>> >Cc: Leonard Rosenthol; Johannes Wilm; Bill Kasdorf; Dave Cramer; W3C
>>> Digital Publishing Discussion list; Matthew Hardy
>>> >Subject: Re: Prioritisation
>>> >
>>> >
>>> >> On 05 Aug 2015, at 11:45 , Kaveh Bazargan <
>>> kaveh@rivervalleytechnologies.com238 <kaveh@rivervalleytechnologies.com>>
>>> wrote:
>>> >>
>>> >>
>>> >>
>>> >> On 5 August 2015 at 10:36, Ivan Herman <ivan@w3.org239 <ivan@w3.org>>
>>> wrote:
>>> >> Leonard,
>>> >>
>>> >> > On 04 Aug 2015, at 21:38 , Leonard Rosenthol <lrosenth@adobe.com240
>>> <lrosenth@adobe.com>> wrote:
>>> >> >
>>> >> > With the focus here on terminology, I think that we also need to be
>>> careful about what the definition of a “page” is in this context.
>>> >> >
>>> >> > In reading over the various messages here, I see (at least) three
>>> different definitions.
>>> >> >
>>> >> > 1 – The content that fits on the device’s screen/output without
>>> requiring any scrolling.
>>> >> > 2 – The content that maps to a semantic concept in the publication
>>> (eg. Index, chapter, article, etc.) and may require scrolling
>>> >> > 3 – The content that maps to the printed or fixed layout
>>> representation.
>>> >> >
>>> >>
>>> >> I like this differentiation, and I would think that #2 is indeed very
>>> important but we may want to, eventually, completely dissociate it from the
>>> concept of paging.
>>> >>
>>> >> My understanding is that publishers, these days, put some sort of a
>>> page mark into the digital output (in the form of an invisible element, of
>>> a metadata, etc.). The purpose of this is to be able to *link* (either
>>> conceptually or through real hyperlinks) into the document. It is obviously
>>> important for various use cases that came up in this thread already (and
>>> others) like academic reference or classroom usage. But, just as you say,
>>> handling these may require scrolling and that because the concept of these
>>> anchors are, actually, orthogonal to display, ie, pages in terms of #1 and
>>> #3.
>>> >>
>>> >> I think there is an interesting discussion to have on where anchors
>>> should be put, what is the granularity of those, can (in future) some sort
>>> of a robust anchoring approach take over the need for these anchors, etc.
>>> It is largely a usability issue, but I think it is better if we separate it
>>> from the concept of pagination…
>>> >>
>>> >> [...]
>>> >>
>>> >> I think we would make progress so much faster if we could refer to
>>> chunks of information (e.g. paragraphs, as lawyers do now) rather than the
>>> 100s of years old physical page model, which is a really too big a target
>>> anyway. But the publishing industry is not exactly forward looking, so I
>>> guess we'll be stuck for a few more decades with having physical pages as
>>> the "version of record". :-(
>>> >
>>> >Let us be optimistic!:-)
>>> >
>>> >Whether paragraphs are the right chunks, or sections, or something
>>> else: I really do not know. Actually… there may not be a universal answer:
>>> what is necessary for legal documents (and which would probably work well
>>> for, say, scholarly publishing, eg, in humanities, where page references
>>> are ubiquitous) may be an overkill for novels. (Think of the number of
>>> references you would have in the "War and Peace" :-)
>>> >
>>> >But yes, putting markers (and/or providing means to links) into chunks
>>> of information is the right abstraction.
>>> >
>>> >Ivan
>>> >
>>> >
>>> >>
>>> >>
>>> >> --
>>> >> Kaveh Bazargan
>>> >> Director
>>> >> River Valley Technologies
>>> >> @kaveh1000
>>> >> +44 7771 824 111241 <%2B44%207771%20824%20111>
>>> >> www.rivervalleytechnologies.com242
>>> <http://www.rivervalleytechnologies.com>
>>> >> www.bazargan.org243 <http://www.bazargan.org>
>>> >
>>> >
>>> >----
>>> >Ivan Herman, W3C
>>> >Digital Publishing Activity Lead
>>> >Home: http://www.w3.org/People/Ivan/244
>>> <http://www.w3.org/People/Ivan/>
>>> >mobile: +31-641044153245 <%2B31-641044153>
>>> >ORCID ID: http://orcid.org/0000-0003-0782-2704246
>>> <http://orcid.org/0000-0003-0782-2704>
>>> >
>>> >
>>> >
>>> >
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> - Nick Ruffilo
>>>
>>> @NickRuffilo
>>>
>>> http://Aerbook.com247 <http://Aerbook.com>
>>>
>>> http://ZenOfTechnology.com248 <http://zenoftechnology.com/>
>>>
>>>
>>>
>>
>>
>
>
> --
> - Nick Ruffilo
> @NickRuffilo
> http://Aerbook.com
> http://ZenOfTechnology.com <http://zenoftechnology.com/>
>
>
Received on Wednesday, 5 August 2015 17:55:30 UTC