Re: [dpub identifiers] Please review updated Identifiers TF wiki from Ivan Herman on 2015-04-08 (public-digipub-ig@w3.org from April 2015)

From: Ivan Herman <ivan@w3.org>
Date: Wed, 8 Apr 2015 18:27:20 +0200
To: Bill Kasdorf <bkasdorf@apexcovantage.com>
Cc: "Stein, Ayla" <astein@illinois.edu>, Thierry Michel <tmichel@w3.org>, W3C Digital Publishing IG <public-digipub-ig@w3.org>
Message-Id: <11398F9C-4602-4C32-BA48-593219A99F67@w3.org>
In spite of being a W3C digerati, ie, the worst possible sort:-), I do understand...

It is good that this came up: it must be recorded as a requirement...

But how does it work, eg, in EDUPUB? Does it mean that the, say, HTML file contains some non-visible spans with ID-s, where the ID somehow reflects the page number of the print version? And what happens if there is a new printed version (but no new digital version)?

Ivan


> On 08 Apr 2015, at 16:55 , Bill Kasdorf <bkasdorf@apexcovantage.com> wrote:
> 
> This issue is mainly pertinent to publications originally published in print and only later provided in digital form. There are of course millions of such publications in libraries, which is the main domain of the HathiTrust.
> 
> The reason this is important is that there are four primary use cases characteristic of this "print is the version of record" situation:
> 
> --The indexes in print books typically (though not universally) point to arbitrary points in the content: the print page breaks.
> --Cross-references in the text of print books typically refer to print page breaks.
> --Citations in the literature (very important in scholarship) point to print page breaks.
> --The accessibility community strongly advocates the recording of print page breaks in digital versions of print publications, particularly textbooks, so that when the teacher says "turn to page 53" the print-disabled user can find that spot (as can any user of the digital version).
> 
> While most W3C folks would argue that this is a relic of print-based publishing (and it is), and would argue that these should be replaced with real links to meaningful points in the content, not to something as arbitrary as a print page break (which is indisputably better), it unfortunately happens to be a real need when we are in this transitional phase; and all of those millions of old books, and the citations to their pages, do actually exist. So it really does turn out to be useful to have "markers" in a digital file designating where the print page breaks are--accompanied, btw, with an ability to designate _which_ print edition the markers refer to.
> 
> As distasteful as that is to digerati like us. ;-)
> 
> And btw, in the context of EPUB-WEB, for these very reasons (especially the accessibility issue), providing such print page break markers is recommended in the EDUPUB spec, which provides a recommended syntax for the marker. It doesn't attempt to contain the page with a start-and-end-tag pair, because you run into well-formedness issues; instead, it just provides an empty element that says, in effect, "page 53 in the print book starts here."
> 
> --Bill K
> 
> -----Original Message-----
> From: Ivan Herman [mailto:ivan@w3.org]
> Sent: Wednesday, April 08, 2015 4:30 AM
> To: Stein, Ayla
> Cc: Thierry Michel; Bill Kasdorf; W3C Digital Publishing IG
> Subject: Re: [dpub identifiers] Please review updated Identifiers TF wiki
> 
> Thank you Ayla.
> 
> Without going into the details of the proposal, the question it raises to me, as part of the EPUB-WEB discussion, is what is the role (if any) of an identifier that identifies a *page*. Indeed, depending on the style of the online document, a page is
> 
> * a very ephemeral entity and thereby it is not really a suitable target for an identifier (a flowing book, whose pagination is based on user interaction, is the obvious example)
> * a fixed entity, ie, for fixed layout document
> 
> it strikes me that an identifier approach for an EPUB-WEB document needs to cover the second item, too. AFAIK, CFI can do that only if the fixed layout document is organized in terms of a series of separate files within the package, but that may not cover all the cases (e.g., if a presentation slide show is stored as a portable document, and the 'pagination' is the result of a javascript running on one single source).
> 
> Whether the approach taken by the HathiTrust document is the right one (as far as I could understand from a cursory look it assigns a UDDI type URN to each page, which is then combined with the identifier of a 'volume') is a different question. I am not sure this is a general solution but I guess the more general questions are certainly valid!
> 
> Thanks again
> 
> Ivan
> 
> 
>> On 07 Apr 2015, at 20:21 , Stein, Ayla <astein@illinois.edu> wrote:
>> 
>> Matt's comment about content version reminded me of some ongoing work at the HathiTrust Research Center. One of the problems they're looking into is identifying an object at a specific point in time. Their initial proposal document discusses several different issues regarding identifiers in HTRC and can be accessed here: https://www.ideals.illinois.edu/handle/2142/73147. I've also added it as an attachment to this email.
>> 
>> I know there's also been some work on a prototype for identifying versions, but the draft of that document is not yet available for circulation. While these aren't necessarily solutions that can be implemented here, I think it's of interest and relevance to this discussion.
>> 
>> Thanks,
>> 
>> Ayla
>> 
>> -----Original Message-----
>> From: Ivan Herman [mailto:ivan@w3.org]
>> Sent: Tuesday, March 24, 2015 3:32 AM
>> To: Thierry Michel
>> Cc: Bill Kasdorf; W3C Digital Publishing IG
>> Subject: Re: [dpub identifiers] Please review updated Identifiers TF
>> wiki
>> 
>> 
>>> On 24 Mar 2015, at 09:30 , Ivan Herman <ivan@w3.org> wrote:
>>> 
>>> I have added the media fragment URI to the wiki with few examples. Thierry, if you want to add something, please do at:
>> 
>> Sorry, pushed the send button too soon:
>> 
>> https://www.w3.org/dpub/IG/wiki/Task_Forces/identifiers#W3C.E2.80.99s_
>> Media_Fragment
>> 
>> Thanks
>> 
>> ivan
>> 
>>> 
>>> 
>>>> On 23 Mar 2015, at 08:20 , Thierry MICHEL <tmichel@w3.org> wrote:
>>>> 
>>>> Bill,
>>>> 
>>>> I would also suggest Media Fragments URI 1.0 It specifies the syntax
>>>> for constructing media fragment URIs and explains how to handle them when used over the HTTP protocol.
>>>> 
>>>> http://www.w3.org/TR/2012/REC-media-frags-20120925/
>>>> a W3C Recommendation 25 September 2012.
>>>> 
>>>> Best,
>>>> 
>>>> thierry.
>>>> 
>>>> On 22/03/2015 17:51, Bill Kasdorf wrote:
>>>>> Thanks to Tzviya, we have some substantive content for review on
>>>>> the Identifiers TF wiki at [1].
>>>>> 
>>>>> This initial draft of background information gives brief
>>>>> descriptions, links, discussion, and examples of three possible
>>>>> options for consideration as the basis for our initial work on a Fragment Identifier:
>>>>> 
>>>>> --EPUB CFI
>>>>> 
>>>>> --W3C Packaging for the Web Fragment Identifiers
>>>>> 
>>>>> --The Open Annotations Fragment Selector
>>>>> 
>>>>> In addition, there's a placeholder for XPath, and we need to
>>>>> collect suggestions for other relevant specs or technologies to
>>>>> take into account, e.g. XPointer.
>>>>> 
>>>>> Please take a look at this before the Monday IG call and suggest
>>>>> any others we should add. Feel free to add a placeholder (ideally
>>>>> with a
>>>>> link) if you aren't prepared to add the prose.
>>>>> 
>>>>> And although we now have a good list of participants in this TF,
>>>>> please add your name if you'd like to participate as well. We will
>>>>> discuss next steps on the call Monday, which will probably involve
>>>>> a TF conference call later this week if we can find a time that works for everybody.
>>>>> 
>>>>> --Bill K
>>>>> 
>>>>> [1]
>>>>> https://www.w3.org/dpub/IG/wiki/Task_Forces/identifiers#Background
>>>>> 
>>>>> Bill Kasdorf
>>>>> 
>>>>> Vice President, Apex Content Solutions
>>>>> 
>>>>> Apex CoVantage
>>>>> 
>>>>> W: +1 734-904-6252
>>>>> 
>>>>> M: +1 734-904-6252
>>>>> 
>>>>> @BillKasdorf <http://twitter.com/#!/BillKasdorf> //
>>>>> 
>>>>> _bkasdorf@apexcovantage.com_
>>>>> 
>>>>> ISNI: 0000 0001 1649 0786__
>>>>> 
>>>>> https://orcid.org/0000-0001-7002-4786
>>>>> <https://orcid.org/0000-0001-7002-4786?lang=en>
>>>>> 
>>>>> www.apexcovantage.com <http://www.apexcovantage.com/>
>>>>> 
>>>>> Corporate Logo-Copy
>>>>> 
>>>> 
>>> 
>>> 
>>> ----
>>> Ivan Herman, W3C
>>> Digital Publishing Activity Lead
>>> Home: http://www.w3.org/People/Ivan/
>>> mobile: +31-641044153
>>> ORCID ID: http://orcid.org/0000-0003-0782-2704
>>> 
>>> 
>>> 
>>> 
>> 
>> 
>> ----
>> Ivan Herman, W3C
>> Digital Publishing Activity Lead
>> Home: http://www.w3.org/People/Ivan/
>> mobile: +31-641044153
>> ORCID ID: http://orcid.org/0000-0003-0782-2704
>> 
>> 
>> 
>> 
>> <IdentifiersProposal.pdf>
> 
> 
> ----
> Ivan Herman, W3C
> Digital Publishing Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> ORCID ID: http://orcid.org/0000-0003-0782-2704
> 
> 
> 
> 
> 


----
Ivan Herman, W3C
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
ORCID ID: http://orcid.org/0000-0003-0782-2704
Received on Wednesday, 8 April 2015 16:27:31 UTC