Re: [dpub identifiers] Please review updated Identifiers TF wiki

Dear Ivan,

In EPUB3 files, HTML content is tagged with empty anchors like :
"Ils n¹en veulent pas, ils n¹en <a id="page_182"/>veulent pas, elle lâche
dans un soupir en attrapant encore une lettre."

This means that a new paper page starts at word « veulent ».

In parallel, the EPUB3 nav document contains an ordered list of navigation
points in a <nav epub:type="page-list »> element :
<li>
   <a href="chap22.html#page_182">Page 182</a>
               </li>
Then the label of this paper page 182 is « Page 182 ».


In term of worflow, by good practice, we produced a new EPUB file as soon
as text corrections have been inserted in the reprint book.

Best,
Luc




Luc Audrain
Hachette Livre
Direction Innovation et Technologie Numérique
11, rue Paul Bert, 92240 Malakoff
Fixe : +33 (0) 1 4123 6370
Mobile : +33 (0) 6 48 38 21 41





Le 08/04/2015 18:27, « Ivan Herman » <ivan@w3.org> a écrit :

>In spite of being a W3C digerati, ie, the worst possible sort:-), I do
>understand...
>
>It is good that this came up: it must be recorded as a requirement...
>
>But how does it work, eg, in EDUPUB? Does it mean that the, say, HTML
>file contains some non-visible spans with ID-s, where the ID somehow
>reflects the page number of the print version? And what happens if there
>is a new printed version (but no new digital version)?
>
>Ivan
>
>
>> On 08 Apr 2015, at 16:55 , Bill Kasdorf <bkasdorf@apexcovantage.com>
>>wrote:
>> 
>> This issue is mainly pertinent to publications originally published in
>>print and only later provided in digital form. There are of course
>>millions of such publications in libraries, which is the main domain of
>>the HathiTrust.
>> 
>> The reason this is important is that there are four primary use cases
>>characteristic of this "print is the version of record" situation:
>> 
>> --The indexes in print books typically (though not universally) point
>>to arbitrary points in the content: the print page breaks.
>> --Cross-references in the text of print books typically refer to print
>>page breaks.
>> --Citations in the literature (very important in scholarship) point to
>>print page breaks.
>> --The accessibility community strongly advocates the recording of print
>>page breaks in digital versions of print publications, particularly
>>textbooks, so that when the teacher says "turn to page 53" the
>>print-disabled user can find that spot (as can any user of the digital
>>version).
>> 
>> While most W3C folks would argue that this is a relic of print-based
>>publishing (and it is), and would argue that these should be replaced
>>with real links to meaningful points in the content, not to something as
>>arbitrary as a print page break (which is indisputably better), it
>>unfortunately happens to be a real need when we are in this transitional
>>phase; and all of those millions of old books, and the citations to
>>their pages, do actually exist. So it really does turn out to be useful
>>to have "markers" in a digital file designating where the print page
>>breaks are--accompanied, btw, with an ability to designate _which_ print
>>edition the markers refer to.
>> 
>> As distasteful as that is to digerati like us. ;-)
>> 
>> And btw, in the context of EPUB-WEB, for these very reasons (especially
>>the accessibility issue), providing such print page break markers is
>>recommended in the EDUPUB spec, which provides a recommended syntax for
>>the marker. It doesn't attempt to contain the page with a
>>start-and-end-tag pair, because you run into well-formedness issues;
>>instead, it just provides an empty element that says, in effect, "page
>>53 in the print book starts here."
>> 
>> --Bill K
>> 
>> -----Original Message-----
>> From: Ivan Herman [mailto:ivan@w3.org]
>> Sent: Wednesday, April 08, 2015 4:30 AM
>> To: Stein, Ayla
>> Cc: Thierry Michel; Bill Kasdorf; W3C Digital Publishing IG
>> Subject: Re: [dpub identifiers] Please review updated Identifiers TF
>>wiki
>> 
>> Thank you Ayla.
>> 
>> Without going into the details of the proposal, the question it raises
>>to me, as part of the EPUB-WEB discussion, is what is the role (if any)
>>of an identifier that identifies a *page*. Indeed, depending on the
>>style of the online document, a page is
>> 
>> * a very ephemeral entity and thereby it is not really a suitable
>>target for an identifier (a flowing book, whose pagination is based on
>>user interaction, is the obvious example)
>> * a fixed entity, ie, for fixed layout document
>> 
>> it strikes me that an identifier approach for an EPUB-WEB document
>>needs to cover the second item, too. AFAIK, CFI can do that only if the
>>fixed layout document is organized in terms of a series of separate
>>files within the package, but that may not cover all the cases (e.g., if
>>a presentation slide show is stored as a portable document, and the
>>'pagination' is the result of a javascript running on one single source).
>> 
>> Whether the approach taken by the HathiTrust document is the right one
>>(as far as I could understand from a cursory look it assigns a UDDI type
>>URN to each page, which is then combined with the identifier of a
>>'volume') is a different question. I am not sure this is a general
>>solution but I guess the more general questions are certainly valid!
>> 
>> Thanks again
>> 
>> Ivan
>> 
>> 
>>> On 07 Apr 2015, at 20:21 , Stein, Ayla <astein@illinois.edu> wrote:
>>> 
>>> Matt's comment about content version reminded me of some ongoing work
>>>at the HathiTrust Research Center. One of the problems they're looking
>>>into is identifying an object at a specific point in time. Their
>>>initial proposal document discusses several different issues regarding
>>>identifiers in HTRC and can be accessed here:
>>>https://www.ideals.illinois.edu/handle/2142/73147. I've also added it
>>>as an attachment to this email.
>>> 
>>> I know there's also been some work on a prototype for identifying
>>>versions, but the draft of that document is not yet available for
>>>circulation. While these aren't necessarily solutions that can be
>>>implemented here, I think it's of interest and relevance to this
>>>discussion.
>>> 
>>> Thanks,
>>> 
>>> Ayla
>>> 
>>> -----Original Message-----
>>> From: Ivan Herman [mailto:ivan@w3.org]
>>> Sent: Tuesday, March 24, 2015 3:32 AM
>>> To: Thierry Michel
>>> Cc: Bill Kasdorf; W3C Digital Publishing IG
>>> Subject: Re: [dpub identifiers] Please review updated Identifiers TF
>>> wiki
>>> 
>>> 
>>>> On 24 Mar 2015, at 09:30 , Ivan Herman <ivan@w3.org> wrote:
>>>> 
>>>> I have added the media fragment URI to the wiki with few examples.
>>>>Thierry, if you want to add something, please do at:
>>> 
>>> Sorry, pushed the send button too soon:
>>> 
>>> https://www.w3.org/dpub/IG/wiki/Task_Forces/identifiers#W3C.E2.80.99s_
>>> Media_Fragment
>>> 
>>> Thanks
>>> 
>>> ivan
>>> 
>>>> 
>>>> 
>>>>> On 23 Mar 2015, at 08:20 , Thierry MICHEL <tmichel@w3.org> wrote:
>>>>> 
>>>>> Bill,
>>>>> 
>>>>> I would also suggest Media Fragments URI 1.0 It specifies the syntax
>>>>> for constructing media fragment URIs and explains how to handle them
>>>>>when used over the HTTP protocol.
>>>>> 
>>>>> http://www.w3.org/TR/2012/REC-media-frags-20120925/
>>>>> a W3C Recommendation 25 September 2012.
>>>>> 
>>>>> Best,
>>>>> 
>>>>> thierry.
>>>>> 
>>>>> On 22/03/2015 17:51, Bill Kasdorf wrote:
>>>>>> Thanks to Tzviya, we have some substantive content for review on
>>>>>> the Identifiers TF wiki at [1].
>>>>>> 
>>>>>> This initial draft of background information gives brief
>>>>>> descriptions, links, discussion, and examples of three possible
>>>>>> options for consideration as the basis for our initial work on a
>>>>>>Fragment Identifier:
>>>>>> 
>>>>>> --EPUB CFI
>>>>>> 
>>>>>> --W3C Packaging for the Web Fragment Identifiers
>>>>>> 
>>>>>> --The Open Annotations Fragment Selector
>>>>>> 
>>>>>> In addition, there's a placeholder for XPath, and we need to
>>>>>> collect suggestions for other relevant specs or technologies to
>>>>>> take into account, e.g. XPointer.
>>>>>> 
>>>>>> Please take a look at this before the Monday IG call and suggest
>>>>>> any others we should add. Feel free to add a placeholder (ideally
>>>>>> with a
>>>>>> link) if you aren't prepared to add the prose.
>>>>>> 
>>>>>> And although we now have a good list of participants in this TF,
>>>>>> please add your name if you'd like to participate as well. We will
>>>>>> discuss next steps on the call Monday, which will probably involve
>>>>>> a TF conference call later this week if we can find a time that
>>>>>>works for everybody.
>>>>>> 
>>>>>> --Bill K
>>>>>> 
>>>>>> [1]
>>>>>> https://www.w3.org/dpub/IG/wiki/Task_Forces/identifiers#Background
>>>>>> 
>>>>>> Bill Kasdorf
>>>>>> 
>>>>>> Vice President, Apex Content Solutions
>>>>>> 
>>>>>> Apex CoVantage
>>>>>> 
>>>>>> W: +1 734-904-6252
>>>>>> 
>>>>>> M: +1 734-904-6252
>>>>>> 
>>>>>> @BillKasdorf <http://twitter.com/#!/BillKasdorf> //
>>>>>> 
>>>>>> _bkasdorf@apexcovantage.com_
>>>>>> 
>>>>>> ISNI: 0000 0001 1649 0786__
>>>>>> 
>>>>>> https://orcid.org/0000-0001-7002-4786
>>>>>> <https://orcid.org/0000-0001-7002-4786?lang=en>
>>>>>> 
>>>>>> www.apexcovantage.com <http://www.apexcovantage.com/>
>>>>>> 
>>>>>> Corporate Logo-Copy
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> ----
>>>> Ivan Herman, W3C
>>>> Digital Publishing Activity Lead
>>>> Home: http://www.w3.org/People/Ivan/
>>>> mobile: +31-641044153
>>>> ORCID ID: http://orcid.org/0000-0003-0782-2704
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> ----
>>> Ivan Herman, W3C
>>> Digital Publishing Activity Lead
>>> Home: http://www.w3.org/People/Ivan/
>>> mobile: +31-641044153
>>> ORCID ID: http://orcid.org/0000-0003-0782-2704
>>> 
>>> 
>>> 
>>> 
>>> <IdentifiersProposal.pdf>
>> 
>> 
>> ----
>> Ivan Herman, W3C
>> Digital Publishing Activity Lead
>> Home: http://www.w3.org/People/Ivan/
>> mobile: +31-641044153
>> ORCID ID: http://orcid.org/0000-0003-0782-2704
>> 
>> 
>> 
>> 
>> 
>
>
>----
>Ivan Herman, W3C
>Digital Publishing Activity Lead
>Home: http://www.w3.org/People/Ivan/
>mobile: +31-641044153
>ORCID ID: http://orcid.org/0000-0003-0782-2704
>
>
>
>

Received on Wednesday, 8 April 2015 16:42:14 UTC