RE: [dpub identifiers] Please review updated Identifiers TF wiki from George Kerscher on 2015-04-08 (public-digipub-ig@w3.org from April 2015)

From: George Kerscher <kerscher@montana.com>
Date: Wed, 8 Apr 2015 09:07:20 -0600
To: "'Bill Kasdorf'" <bkasdorf@apexcovantage.com>, "'Ivan Herman'" <ivan@w3.org>, "'Stein, Ayla'" <astein@illinois.edu>
Cc: "'Thierry Michel'" <tmichel@w3.org>, "'W3C Digital Publishing IG'" <public-digipub-ig@w3.org>
Message-ID: <007501d0720d$b6cfaff0$246f0fd0$@montana.com>
+1 for Bill's post/George

-----Original Message-----
From: Bill Kasdorf [mailto:bkasdorf@apexcovantage.com] 
Sent: Wednesday, April 08, 2015 8:55 AM
To: Ivan Herman; Stein, Ayla
Cc: Thierry Michel; W3C Digital Publishing IG
Subject: RE: [dpub identifiers] Please review updated Identifiers TF wiki

This issue is mainly pertinent to publications originally published in print
and only later provided in digital form. There are of course millions of
such publications in libraries, which is the main domain of the HathiTrust.

The reason this is important is that there are four primary use cases
characteristic of this "print is the version of record" situation:

--The indexes in print books typically (though not universally) point to
arbitrary points in the content: the print page breaks.
--Cross-references in the text of print books typically refer to print page
breaks.
--Citations in the literature (very important in scholarship) point to print
page breaks.
--The accessibility community strongly advocates the recording of print page
breaks in digital versions of print publications, particularly textbooks, so
that when the teacher says "turn to page 53" the print-disabled user can
find that spot (as can any user of the digital version).

While most W3C folks would argue that this is a relic of print-based
publishing (and it is), and would argue that these should be replaced with
real links to meaningful points in the content, not to something as
arbitrary as a print page break (which is indisputably better), it
unfortunately happens to be a real need when we are in this transitional
phase; and all of those millions of old books, and the citations to their
pages, do actually exist. So it really does turn out to be useful to have
"markers" in a digital file designating where the print page breaks
are--accompanied, btw, with an ability to designate _which_ print edition
the markers refer to.

As distasteful as that is to digerati like us. ;-)

And btw, in the context of EPUB-WEB, for these very reasons (especially the
accessibility issue), providing such print page break markers is recommended
in the EDUPUB spec, which provides a recommended syntax for the marker. It
doesn't attempt to contain the page with a start-and-end-tag pair, because
you run into well-formedness issues; instead, it just provides an empty
element that says, in effect, "page 53 in the print book starts here."

--Bill K

-----Original Message-----
From: Ivan Herman [mailto:ivan@w3.org] 
Sent: Wednesday, April 08, 2015 4:30 AM
To: Stein, Ayla
Cc: Thierry Michel; Bill Kasdorf; W3C Digital Publishing IG
Subject: Re: [dpub identifiers] Please review updated Identifiers TF wiki

Thank you Ayla.

Without going into the details of the proposal, the question it raises to
me, as part of the EPUB-WEB discussion, is what is the role (if any) of an
identifier that identifies a *page*. Indeed, depending on the style of the
online document, a page is

* a very ephemeral entity and thereby it is not really a suitable target for
an identifier (a flowing book, whose pagination is based on user
interaction, is the obvious example)
* a fixed entity, ie, for fixed layout document

it strikes me that an identifier approach for an EPUB-WEB document needs to
cover the second item, too. AFAIK, CFI can do that only if the fixed layout
document is organized in terms of a series of separate files within the
package, but that may not cover all the cases (e.g., if a presentation slide
show is stored as a portable document, and the 'pagination' is the result of
a javascript running on one single source).

Whether the approach taken by the HathiTrust document is the right one (as
far as I could understand from a cursory look it assigns a UDDI type URN to
each page, which is then combined with the identifier of a 'volume') is a
different question. I am not sure this is a general solution but I guess the
more general questions are certainly valid!

Thanks again

Ivan


> On 07 Apr 2015, at 20:21 , Stein, Ayla <astein@illinois.edu> wrote:
> 
> Matt's comment about content version reminded me of some ongoing work at
the HathiTrust Research Center. One of the problems they're looking into is
identifying an object at a specific point in time. Their initial proposal
document discusses several different issues regarding identifiers in HTRC
and can be accessed here: https://www.ideals.illinois.edu/handle/2142/73147.
I've also added it as an attachment to this email.
> 
> I know there's also been some work on a prototype for identifying
versions, but the draft of that document is not yet available for
circulation. While these aren't necessarily solutions that can be
implemented here, I think it's of interest and relevance to this discussion.
> 
> Thanks,
> 
> Ayla
> 
> -----Original Message-----
> From: Ivan Herman [mailto:ivan@w3.org]
> Sent: Tuesday, March 24, 2015 3:32 AM
> To: Thierry Michel
> Cc: Bill Kasdorf; W3C Digital Publishing IG
> Subject: Re: [dpub identifiers] Please review updated Identifiers TF 
> wiki
> 
> 
>> On 24 Mar 2015, at 09:30 , Ivan Herman <ivan@w3.org> wrote:
>> 
>> I have added the media fragment URI to the wiki with few examples.
Thierry, if you want to add something, please do at:
> 
> Sorry, pushed the send button too soon:
> 
> https://www.w3.org/dpub/IG/wiki/Task_Forces/identifiers#W3C.E2.80.99s_
> Media_Fragment
> 
> Thanks
> 
> ivan
> 
>> 
>> 
>>> On 23 Mar 2015, at 08:20 , Thierry MICHEL <tmichel@w3.org> wrote:
>>> 
>>> Bill,
>>> 
>>> I would also suggest Media Fragments URI 1.0 It specifies the syntax 
>>> for constructing media fragment URIs and explains how to handle them
when used over the HTTP protocol.
>>> 
>>> http://www.w3.org/TR/2012/REC-media-frags-20120925/
>>> a W3C Recommendation 25 September 2012.
>>> 
>>> Best,
>>> 
>>> thierry.
>>> 
>>> On 22/03/2015 17:51, Bill Kasdorf wrote:
>>>> Thanks to Tzviya, we have some substantive content for review on 
>>>> the Identifiers TF wiki at [1].
>>>> 
>>>> This initial draft of background information gives brief 
>>>> descriptions, links, discussion, and examples of three possible 
>>>> options for consideration as the basis for our initial work on a
Fragment Identifier:
>>>> 
>>>> --EPUB CFI
>>>> 
>>>> --W3C Packaging for the Web Fragment Identifiers
>>>> 
>>>> --The Open Annotations Fragment Selector
>>>> 
>>>> In addition, there's a placeholder for XPath, and we need to 
>>>> collect suggestions for other relevant specs or technologies to 
>>>> take into account, e.g. XPointer.
>>>> 
>>>> Please take a look at this before the Monday IG call and suggest 
>>>> any others we should add. Feel free to add a placeholder (ideally 
>>>> with a
>>>> link) if you aren't prepared to add the prose.
>>>> 
>>>> And although we now have a good list of participants in this TF, 
>>>> please add your name if you'd like to participate as well. We will 
>>>> discuss next steps on the call Monday, which will probably involve 
>>>> a TF conference call later this week if we can find a time that works
for everybody.
>>>> 
>>>> --Bill K
>>>> 
>>>> [1]
>>>> https://www.w3.org/dpub/IG/wiki/Task_Forces/identifiers#Background
>>>> 
>>>> Bill Kasdorf
>>>> 
>>>> Vice President, Apex Content Solutions
>>>> 
>>>> Apex CoVantage
>>>> 
>>>> W: +1 734-904-6252
>>>> 
>>>> M: +1 734-904-6252
>>>> 
>>>> @BillKasdorf <http://twitter.com/#!/BillKasdorf> //
>>>> 
>>>> _bkasdorf@apexcovantage.com_
>>>> 
>>>> ISNI: 0000 0001 1649 0786__
>>>> 
>>>> https://orcid.org/0000-0001-7002-4786
>>>> <https://orcid.org/0000-0001-7002-4786?lang=en>
>>>> 
>>>> www.apexcovantage.com <http://www.apexcovantage.com/>
>>>> 
>>>> Corporate Logo-Copy
>>>> 
>>> 
>> 
>> 
>> ----
>> Ivan Herman, W3C
>> Digital Publishing Activity Lead
>> Home: http://www.w3.org/People/Ivan/
>> mobile: +31-641044153
>> ORCID ID: http://orcid.org/0000-0003-0782-2704
>> 
>> 
>> 
>> 
> 
> 
> ----
> Ivan Herman, W3C
> Digital Publishing Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> ORCID ID: http://orcid.org/0000-0003-0782-2704
> 
> 
> 
> 
> <IdentifiersProposal.pdf>


----
Ivan Herman, W3C
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
ORCID ID: http://orcid.org/0000-0003-0782-2704
Received on Wednesday, 8 April 2015 15:08:22 UTC