Re: As an aside, a possibly interesting read.... from Laura Dawson on 2014-09-25 (public-digipub@w3.org from September 2014)

From: Laura Dawson <Laura.Dawson@bowker.com>
Date: Thu, 25 Sep 2014 15:06:25 +0000
To: Bill Kasdorf <bkasdorf@apexcovantage.com>, "Todd Carpenter (Gmail)" <tcarpenter@niso.org>, Koji Ishii <kojiishi@gluesoft.co.jp>
CC: Ivan Herman <ivan@w3.org>, "David (Standards) Singer" <singer@apple.com>, Laura Dawson <ljndawson@gmail.com>, Graham Bell <graham@editeur.org>, "Phil Madans" <Phil.Madans@hbgusa.com>, "W3C Public Digital Publishing IG Mailing List" <public-digipub-ig-comment@w3.org>
Message-ID: <D049A79E.805F3%laura.dawson@bowker.com>
Amen, Bill. I could not agree more.

On 9/25/14, 11:04 AM, "Bill Kasdorf" <bkasdorf@apexcovantage.com> wrote:

>I also want to point out that what we really need is not just about books.
>
>Even though there has been frequent discussion on the IG about whether we
>can _focus_ on books (and the consensus, which I reluctantly went along
>with, is yes), for something this fundamental we really need to think in
>terms of a _publication_ or even a _resource_.
>
>Even in traditionally book-dominated sectors like educational publishing,
>there is a rapid movement away from the concept of a "book" at all.
>Professors increasingly are willing to let students use any of a range of
>"textbooks" as a resource for, say, calculus or microbiology, as long as
>they are useful and have information that is relevant to the course.
>Increasingly those "books" themselves are being deconstructed, and more
>importantly most big educational publishers are moving toward a vision in
>which they develop resources first and books (or parts of books) are just
>one of many ways of associating, combining, and distributing those
>resources. And that is done in the context of _all the other stuff out
>there_ (mostly but not exclusively on the Web).
>
>All that stuff has to be able to be identified, cited, annotated, etc.
>etc.
>
>I could have written that description just as well in the context of
>magazines, for which _exactly the same dynamic_ is happening. Right now.
>
>Same for scholarly/STM publishing (where publishing _data_--and citing
>datasets--is a very live issue). And even in the humanities, where
>"Digital Humanities" is becoming mainstream (and which is about "works"
>in the FRBR sense).
>
>And think of all the resources needed in corporate publishing, training,
>etc.
>
>All of that is "publishing." No publication exists in a closed system. It
>may think it is in a walled garden but there is a giant jungle outside
>its walls.
>
>I really think in the pursuit of this identifier issue we MUST take the
>broadest possible vision or we will come up with something that is useful
>in one sector (perhaps) but not truly interoperable in the publishing
>ecosystem and the web in general (the context in which the publishing
>ecosystem increasingly lives and works) and will thus ultimately prove
>inadequate.
>
>This is not to replace domain-specific or purpose-built identifiers like
>the DOI, the ISBN, etc.--those that, as Todd and others pointed out, have
>metadata and systems associated with them to DO THINGS. Any identifier we
>come up with should not make those obsolete and ideally should not
>conflict with them at all. It should make them more interoperable and
>more useful. This is not a Battle of Identifiers, and those who think One
>and Only One Identifier is the goal are mistaken. Many identifiers are
>needed because we need to do many different things with them.
>
>But the identifier we are looking for here--enabling annotation and a
>myriad other related things on the Web (citation, previews, chunking,
>etc.)--needs to be radically widely applicable, completely agnostic as to
>the type of publication or resource it identifies, the format in which
>that publication or resource is disseminated, and yet durable,
>persistent, and reliable across formats and across time.
>
>--Bill Kasdorf
>
>-----Original Message-----
>From: Laura Dawson [mailto:Laura.Dawson@bowker.com]
>Sent: Thursday, September 25, 2014 9:01 AM
>To: Todd Carpenter (Gmail); Koji Ishii
>Cc: Ivan Herman; David (Standards) Singer; Laura Dawson; Bill Kasdorf;
>Graham Bell; Phil Madans; W3C Public Digital Publishing IG Mailing List
>Subject: Re: As an aside, a possibly interesting read....
>
>Todd, I think you're absolutely right about the difference between
>librarianship and the trade. It has been the function of libraries to
>archive, curate, and canonize information since their inception. Trade is
>about one thing and one thing only - sales. In building infrastructure,
>we need to support both. What both have in common is a need for effective
>discovery - directing a reader to the book they want. So much of the
>metadata will be shared in common - that which describes the book; the
>metadata describing the terms by which a reader may have it will differ
>depending on.well, the terms - the environment in which the reader is
>discovering the book.
>
>That all said, I can envision a world where - for the purposes of
>curation and archiving - there exists a "canonical" version of a book at
>a URI that could well consist of the ISBN for that book (as Koji
>described), but if you want to own the book, you are directed to
>whichever platforms support it, and you choose which one you want to read
>on. But that presupposes an authority to govern that system. I would say
>the ISBN-International Agency could be that authority, but there is one
>important issue that prevents that - no publisher is required to report
>back to ISBN-IA which ISBNs get assigned to which books. ISBNs are issued
>in blocks - and in the case of larger publishers, many never see the
>light of day. ISBN-IA does not maintain a database of the ISBNs that get
>assigned - that is down to the registration agencies (such as Bowker,
>Nielsen, national libraries). And the publishers don't always report back
>to the RA's which numbers they are assigning to which things.
>
>Also to be considered - in a world of self-publishing, ISBNs frequently
>are not assigned at all. Books are available in proprietary systems only
>(Kindle), and not easily discoverable. Amazon is said to be publishing
>about 2000 of these per week. We have no idea what they are, if they are
>books or "shorts", fiction, memoir, cookbooks - only Amazon has that
>data, and the data is provided by author/publishers who are not
>necessarily familiar with metadata conventions and effective description.
>
>So, to be succinct, whether distributed or centralized, we need to break
>down the specific problems based on audience and the pain we're trying to
>solve. Probably won't be a single solution.
>
>On 9/25/14, 2:58 AM, "Todd Carpenter (Gmail)" <tcarpenter@niso.org> wrote:
>
>>There is a tremendous problem with distributed systems when it comes to
>>canonical information and standard identifiers.  That being the
>>metadata that is associated with that identifier.  An identifier is (or
>>better put should be) just a dumb (i.e., without embedded meaning),
>>unique set of string of characters. The structure of that string, while
>>systematically important is beside the point. Whether an identifier is
>>expressed as a 16-digit string, or as an URI or anything else is not
>>finally the point.
>>
>>The real power is in the associated metadata related to that identifier.
>>While there is tremendous overhead in a centralized system, they are
>>critically important in a well-functioning ID system. Without a
>>controlling system, then there will be no standard set of associated
>>metadata.  Now, how well that metadata is created, managed, curated and
>>controlled are open questions (as Laura certainly knows), but without
>>some authority driving compliance than inevitably there will be an
>>increasing divergence of metadata quality, practice and interoperability.
>> 
>>
>>Also to Ivan's question about work-level IDs, there is work being done
>>by OCLC to develop a true FRBR Work-level identifier based on their
>>data store of library's bibliographic data. This ID is derived by
>>analysis of the collection once the items are released then catalogued.
>>I am not certain that a similar level work ID would be possible in
>>trade, outside of being done by the author, agent or rights manager to
>>truly combine all of the works (in a FRBR sense) under a single ID.
>>Identifying say, the hardcover book of a story, comic book version of
>>that same story, the blue-ray DVD of that story, the broadway play of
>>that story, and the swedish translation of the book into a single
>>Work-level ID is only something that can be done after the fact,
>>because their expressions are very, very different. The closest that we
>>might come to identifying that pre-production is to ID the rights
>>associated with a particular intellectual property. And while it may be
>>useful in practice, I don't know it would be useful in application.
>>Which, I expect in the end would only serve the purpose of making lots
>>of IP lawyers very wealthy.
>>
>>Todd
>>
>>
>>
>>
>>On Sep 25, 2014, at 5:07 AM, Koji Ishii <kojiishi@gluesoft.co.jp> wrote:
>>
>>> Maybe this was already discussed, but I'm in favor of a distributed
>>>ID system than a single, central system.
>>> 
>>> Take DNS. Or Java namespace. Their prefix comes from domain names
>>>authors own, which is unique, then authors can define whatever the rest.
>>>If a publisher wants to use ISBN, they could use, for instance,
>>><epub://isbn-international.org/123456789>.
>>> 
>>> Since what we want is to identify publications, as long as authors or
>>>publications agree to use consistent domains/postfixes, I guess we can
>>>guarantee the uniqueness.
>>> 
>>> Maybe there are more use cases for the ID more than identifying
>>>publications? Use cases I have in mind are for links between
>>>publications and OA, these I think distributed system can do.
>>> 
>>> /koji
>>> 
>>> On Sep 25, 2014, at 12:51 PM, Ivan Herman <ivan@w3.org> wrote:
>>> 
>>>> 
>>>> On 24 Sep 2014, at 23:14 , Laura Dawson <Laura.Dawson@bowker.com>
>>>>wrote:
>>>> 
>>>>> True. It's a cluttered road.
>>>> 
>>>> We are in a really dangerous business!
>>>> 
>>>> Ivan
>>>> 
>>>>> 
>>>>> On 9/24/14, 5:12 PM, "David (Standards) Singer" <singer@apple.com>
>>>>>wrote:
>>>>> 
>>>>>> 
>>>>>> On Sep 24, 2014, at 12:16 , LAURA DAWSON <ljndawson@gmail.com>
>>>>>>wrote:
>>>>>> 
>>>>>>> Yes, Bowker were a DOI registration agency and I can tell you
>>>>>>>that the  associated systems and metadata were the primary reason
>>>>>>>DOIs for trade  books (as opposed to STEM/scholarly) never took
>>>>>>>off.
>>>>>>> 
>>>>>>> So you see, Ivan, the road to book URIs is littered with a couple
>>>>>>> of corpses.
>>>>>> 
>>>>>> It's not just books.  I was on a project that needed something for
>>>>>>recordings many years ago, and that road was also strewn with
>>>>>>corpses.
>>>>>> 
>>>>>>> 
>>>>>>> On 9/24/14, 3:13 PM, "Bill Kasdorf" <bkasdorf@apexcovantage.com>
>>>>>>>wrote:
>>>>>>> 
>>>>>>>> Actually, the DOI _is_ used for this, mainly by scholarly/STM
>>>>>>>>publishers,  as well as for chapters of books--typically one DOI
>>>>>>>>for the book and a  DOI for each chapter (and sometimes DOIs at
>>>>>>>>even lower component  levels,  most often for figures and
>>>>>>>>tables). And these are _agnostic_ as to  format, they typically
>>>>>>>>mean "the book" and "the chapter" in the  abstract  sense. When
>>>>>>>>you click on one of these DOIs you are usually then given  your
>>>>>>>>choice of what format, whether you have access, how to obtain
>>>>>>>>access, etc.
>>>>>>>> 
>>>>>>>> But it requires the associated systems, metadata, registration
>>>>>>>>agency,  etc. to make it work. To belabor a point, though, in
>>>>>>>>that context it  does  work. There are a gazillion of them. The
>>>>>>>>whole scholarly/STM ecosystem  is  now dependent on DOIs.
>>>>>>>> 
>>>>>>>> Those that use the DOI for this use CrossRef DOIs, which
>>>>>>>>_should_ be  expressed as URIs (and increasingly are).
>>>>>>>> 
>>>>>>>> But all that is purely under the control of the publisher
>>>>>>>>(including  what  the DOI links to and what that destination
>>>>>>>>provides--not necessarily  the  content itself); it doesn't
>>>>>>>>address "work" in the way librarians mean  "work," and it
>>>>>>>>requires the systems I mentioned (including the Handle  system on
>>>>>>>>which DOI is based). It would not work for our need to point  to
>>>>>>>>the "work itself" or some component of the work. So the answer in
>>>>>>>>a  purely standard web-world sense is still no.
>>>>>>>> 
>>>>>>>> --Bill K
>>>>>>>> 
>>>>>>>> -----Original Message-----
>>>>>>>> From: Laura Dawson [mailto:Laura.Dawson@bowker.com]
>>>>>>>> Sent: Wednesday, September 24, 2014 2:55 PM
>>>>>>>> To: Ivan Herman; Graham Bell
>>>>>>>> Cc: Laura Dawson; Phil Madans; Bill Kasdorf; W3C Public Digital
>>>>>>>> Publishing IG Mailing List
>>>>>>>> Subject: Re: As an aside, a possibly interesting read....
>>>>>>>> 
>>>>>>>> As it stands now, no. So a book's "home" on the web (regardless
>>>>>>>>of
>>>>>>>> edition) is not standardizable at this point unless you want to
>>>>>>>>go down  the DOI road (please let's not go down the DOI road).
>>>>>>>> 
>>>>>>>> On 9/24/14, 4:13 AM, "Ivan Herman" <ivan@w3.org> wrote:
>>>>>>>> 
>>>>>>>>> Thanks for all the interesting discussion...
>>>>>>>>> 
>>>>>>>>> However: all this is to say that there does not seem to be any
>>>>>>>>>existing  (and viable) option to uniquely identify (preferably
>>>>>>>>>through a
>>>>>>>>>URI) a
>>>>>>>>> 'work' (whether in the ISTC or the FRBR sense). Which is a
>>>>>>>>>problem for  metadata as well as for archiving. :-( Tell me I am
>>>>>>>>>wrong, please...
>>>>>>>>> 
>>>>>>>>> Ivan
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On 24 Sep 2014, at 24:19 , Graham Bell <graham@editeur.org>
>>>>>>>>>wrote:
>>>>>>>>> 
>>>>>>>>>> And they can be treated this way in ONIX too. As I said,
>>>>>>>>>> 
>>>>>>>>>>> they are not (strictly) an attribute of the ISBN, though they
>>>>>>>>>>>may be  presented as such in various systems
>>>>>>>>>> 
>>>>>>>>>> G
>>>>>>>>>> 
>>>>>>>>>> NB repeatable because the ISBN is associated directly with
>>>>>>>>>>only one  work, but can be indirectly associated (through that
>>>>>>>>>>work) with  several other works.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On 23 Sep 2014, at 21:12, LAURA DAWSON wrote:
>>>>>>>>>> 
>>>>>>>>>>> Yes, even at Bowker we made them a repeatable attribute on
>>>>>>>>>>>the ISBN  record.
>>>>>>>>>>> 
>>>>>>>>>>> From: "Madans, Phil" <Phil.Madans@hbgusa.com>
>>>>>>>>>>> Date: Tuesday, September 23, 2014 at 3:13 PM
>>>>>>>>>>> To: Laura Dawson <ljndawson@gmail.com>, Graham Bell
>>>>>>>>>>><graham@editeur.org>, Bill Kasdorf
>>>>>>>>>>><bkasdorf@apexcovantage.com>,  Ivan  Herman <ivan@w3.org>, W3C
>>>>>>>>>>>Public Digital Publishing IG Mailing List
>>>>>>>>>>><public-digipub-ig-comment@w3.org>
>>>>>>>>>>> Subject: Re: As an aside, a possibly interesting read....
>>>>>>>>>>> 
>>>>>>>>>>> I stand corrected on the assignment of the ISTC. Bad choice
>>>>>>>>>>>of  words.
>>>>>>>>>>> I was speaking more on how I would have to manage them
>>>>>>>>>>>internally on  the systems level―that's how I think about
>>>>>>>>>>>these things―and that  would be as an attribute.  That  all
>>>>>>>>>>>depends on how titles systems  are structured, and I'm not
>>>>>>>>>>>saying ours is the best way to do  things,  but I think the
>>>>>>>>>>>way we do it is how most do it these days. From a  practical
>>>>>>>>>>>standpoint, I'm not sure how else I could handle them. IF  I
>>>>>>>>>>>publish an English and Spanish edition of a work, and the
>>>>>>>>>>>ISTC's are  different, then they would be attributes of the
>>>>>>>>>>>ISBNs so that I  could  keep them linked internally.  We are
>>>>>>>>>>>already doing this, as is most  everyone else, and I think
>>>>>>>>>>>that is why the ISTC was such a hard  sell.
>>>>>>>>>>> 
>>>>>>>>>>> ------------------------------------------------------------
>>>>>>>>>>> Phil Madans | Executive Director of Digital Publishing
>>>>>>>>>>>Technology |  Hachette Book Group | 237 Park Avenue NY 10017
>>>>>>>>>>>|212-364-1415 |  phil.madans@hbgusa.com
>>>>>>>>>>> 
>>>>>>>>>>> From: LAURA DAWSON <ljndawson@gmail.com>
>>>>>>>>>>> Date: Tuesday, September 23, 2014 at 2:22 PM
>>>>>>>>>>> To: Graham Bell <graham@editeur.org>, Phil Madans
>>>>>>>>>>><phil.madans@hbgusa.com>, Bill Kasdorf
>>>>>>>>>>><bkasdorf@apexcovantage.com>,
>>>>>>>>>>> Ivan Herman <ivan@w3.org>, W3C Public Digital Publishing IG
>>>>>>>>>>>Mailing  List <public-digipub-ig-comment@w3.org>
>>>>>>>>>>> Subject: Re: As an aside, a possibly interesting read....
>>>>>>>>>>> 
>>>>>>>>>>> Bowker was an ISTC registration agency until recently. We
>>>>>>>>>>>pulled out  because of the lack of support in the US, and
>>>>>>>>>>>refer the few curious  to Nielsen.
>>>>>>>>>>> 
>>>>>>>>>>> From: Graham Bell <graham@editeur.org>
>>>>>>>>>>> Date: Tuesday, September 23, 2014 at 2:09 PM
>>>>>>>>>>> To: Phil Madans <Phil.Madans@hbgusa.com>, Laura Dawson
>>>>>>>>>>><ljndawson@gmail.com>, Bill Kasdorf
>>>>>>>>>>><bkasdorf@apexcovantage.com>,
>>>>>>>>>>> Ivan Herman <ivan@w3.org>, W3C Public Digital Publishing IG
>>>>>>>>>>>Mailing  List <public-digipub-ig-comment@w3.org>
>>>>>>>>>>> Subject: Re: As an aside, a possibly interesting read....
>>>>>>>>>>> 
>>>>>>>>>>> What Phil and Laura have written certainly summarises -- and
>>>>>>>>>>> illustrates -- the debate over identifiers.
>>>>>>>>>>> 
>>>>>>>>>>> But the text below (from Phil) is a little misleading.
>>>>>>>>>>> 
>>>>>>>>>>>> Whether an ISTC
>>>>>>>>>>>> is a real work Identifier or not is a matter of debate. I
>>>>>>>>>>>>disagree  that ii  is. It is actually an attribute of the
>>>>>>>>>>>>ISBN―-hat is how  they are assigned.
>>>>>>>>>>>> Different ISBNs of the same master content might have
>>>>>>>>>>>>different  ISTC's.
>>>>>>>>>>>> Translations for instance.
>>>>>>>>>>> 
>>>>>>>>>>> The 'rules' of the ISTC say that translations are by
>>>>>>>>>>>definition  different works, and MUST have different ISTCs
>>>>>>>>>>>(though those ISTCs  will be related to each other -- one is a
>>>>>>>>>>>'derived work', and this  close relationship is recorded in
>>>>>>>>>>>the registration metadata for the  ISTCs themselves). This
>>>>>>>>>>>contrasts with library practice, where  'work'
>>>>>>>>>>> is something at a higher level and two translations are
>>>>>>>>>>>actually  termed two 'expressions' of the same 'work'. In
>>>>>>>>>>>library terms, the  ISTC is an expression identifier. See the
>>>>>>>>>>>attached PDF (a slide from  a training session that I deliver
>>>>>>>>>>>fairly regularly) for a summary of  how the <indecs> model on
>>>>>>>>>>>which ISTC and ONIX are based compares  with  the FRBR library
>>>>>>>>>>>model. There is -- as far as I know -- no public  identifier
>>>>>>>>>>>that works at the FRBR:work level, though libraries may  have
>>>>>>>>>>>internal IDs.
>>>>>>>>>>> 
>>>>>>>>>>> And I'm pretty sure ISTCs can be assigned without an ISBN
>>>>>>>>>>>(and  without any product ID at all, in fact) -- they are not
>>>>>>>>>>>(strictly)
>>>>>>>>>>> an
>>>>>>>>>>> attribute of the ISBN, though they may be presented as such
>>>>>>>>>>>in  various  systems.
>>>>>>>>>>> They can be registered based on a manuscript, prior to there
>>>>>>>>>>>being a  product.
>>>>>>>>>>> 
>>>>>>>>>>> On the other hand, there's no doubt that ISTC has so far
>>>>>>>>>>>proved  unpopular among publishers, for some of the reasons
>>>>>>>>>>>Laura and Phil  list, and its actual usage is minimal.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Graham
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Graham Bell
>>>>>>>>>>> EDItEUR
>>>>>>>>>>> 
>>>>>>>>>>> Tel: +44 20 7503 6418
>>>>>>>>>>> Mob: +44 7887 754958
>>>>>>>>>>> 
>>>>>>>>>>> EDItEUR Limited is a company limited by guarantee, registered
>>>>>>>>>>> in England no 2994705. Registered Office: United House, North
>>>>>>>>>>> Road, London
>>>>>>>>>>> N7 9DP, UK. Website: http://www.editeur.org
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> This may contain confidential material. If you are not an
>>>>>>>>>>>intended  recipient, please notify the sender, delete
>>>>>>>>>>>immediately, and  understand that no disclosure or reliance on
>>>>>>>>>>>the information herein  is  permitted.
>>>>>>>>>>> Hachette Book Group may monitor email to and from our network.
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> ----
>>>>>>>>> Ivan Herman, W3C
>>>>>>>>> Digital Publishing Activity Lead
>>>>>>>>> Home: http://www.w3.org/People/Ivan/
>>>>>>>>> mobile: +31-641044153
>>>>>>>>> GPG: 0x343F1A3D
>>>>>>>>> WebID: http://www.ivan-herman.net/foaf#me
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> David Singer
>>>>>> Manager, Software Standards, Apple Inc.
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> ----
>>>> Ivan Herman, W3C
>>>> Digital Publishing Activity Lead
>>>> Home: http://www.w3.org/People/Ivan/
>>>> mobile: +31-641044153
>>>> GPG: 0x343F1A3D
>>>> WebID: http://www.ivan-herman.net/foaf#me
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>
>
Received on Thursday, 25 September 2014 15:07:00 UTC