Re: As an aside, a possibly interesting read.... from Laura Dawson on 2014-09-25 (public-digipub@w3.org from September 2014)

From: Laura Dawson <Laura.Dawson@bowker.com>
Date: Thu, 25 Sep 2014 13:01:14 +0000
To: "Todd Carpenter (Gmail)" <tcarpenter@niso.org>, Koji Ishii <kojiishi@gluesoft.co.jp>
CC: Ivan Herman <ivan@w3.org>, "David (Standards) Singer" <singer@apple.com>, Laura Dawson <ljndawson@gmail.com>, Bill Kasdorf <bkasdorf@apexcovantage.com>, Graham Bell <graham@editeur.org>, Phil Madans <Phil.Madans@hbgusa.com>, W3C Public Digital Publishing IG Mailing List <public-digipub-ig-comment@w3.org>
Message-ID: <D0498739.80536%laura.dawson@bowker.com>
Todd, I think you’re absolutely right about the difference between
librarianship and the trade. It has been the function of libraries to
archive, curate, and canonize information since their inception. Trade is
about one thing and one thing only - sales. In building infrastructure, we
need to support both. What both have in common is a need for effective
discovery - directing a reader to the book they want. So much of the
metadata will be shared in common - that which describes the book; the
metadata describing the terms by which a reader may have it will differ
depending on…well, the terms - the environment in which the reader is
discovering the book.

That all said, I can envision a world where - for the purposes of curation
and archiving - there exists a “canonical” version of a book at a URI that
could well consist of the ISBN for that book (as Koji described), but if
you want to own the book, you are directed to whichever platforms support
it, and you choose which one you want to read on. But that presupposes an
authority to govern that system. I would say the ISBN-International Agency
could be that authority, but there is one important issue that prevents
that - no publisher is required to report back to ISBN-IA which ISBNs get
assigned to which books. ISBNs are issued in blocks - and in the case of
larger publishers, many never see the light of day. ISBN-IA does not
maintain a database of the ISBNs that get assigned - that is down to the
registration agencies (such as Bowker, Nielsen, national libraries). And
the publishers don’t always report back to the RA’s which numbers they are
assigning to which things.

Also to be considered - in a world of self-publishing, ISBNs frequently
are not assigned at all. Books are available in proprietary systems only
(Kindle), and not easily discoverable. Amazon is said to be publishing
about 2000 of these per week. We have no idea what they are, if they are
books or “shorts”, fiction, memoir, cookbooks - only Amazon has that data,
and the data is provided by author/publishers who are not necessarily
familiar with metadata conventions and effective description.

So, to be succinct, whether distributed or centralized, we need to break
down the specific problems based on audience and the pain we’re trying to
solve. Probably won’t be a single solution.

On 9/25/14, 2:58 AM, "Todd Carpenter (Gmail)" <tcarpenter@niso.org> wrote:

>There is a tremendous problem with distributed systems when it comes to
>canonical information and standard identifiers.  That being the metadata
>that is associated with that identifier.  An identifier is (or better put
>should be) just a dumb (i.e., without embedded meaning), unique set of
>string of characters. The structure of that string, while systematically
>important is beside the point. Whether an identifier is expressed as a
>16-digit string, or as an URI or anything else is not finally the point.
>
>The real power is in the associated metadata related to that identifier.
>While there is tremendous overhead in a centralized system, they are
>critically important in a well-functioning ID system. Without a
>controlling system, then there will be no standard set of associated
>metadata.  Now, how well that metadata is created, managed, curated and
>controlled are open questions (as Laura certainly knows), but without
>some authority driving compliance than inevitably there will be an
>increasing divergence of metadata quality, practice and interoperability.
> 
>
>Also to Ivan’s question about work-level IDs, there is work being done by
>OCLC to develop a true FRBR Work-level identifier based on their data
>store of library’s bibliographic data. This ID is derived by analysis of
>the collection once the items are released then catalogued. I am not
>certain that a similar level work ID would be possible in trade, outside
>of being done by the author, agent or rights manager to truly combine all
>of the works (in a FRBR sense) under a single ID.  Identifying say, the
>hardcover book of a story, comic book version of that same story, the
>blue-ray DVD of that story, the broadway play of that story, and the
>swedish translation of the book into a single Work-level ID is only
>something that can be done after the fact, because their expressions are
>very, very different. The closest that we might come to identifying that
>pre-production is to ID the rights associated with a particular
>intellectual property. And while it may be useful in practice, I don’t
>know it would be useful in application. Which, I expect in the end would
>only serve the purpose of making lots of IP lawyers very wealthy.
>
>Todd
>
>
>
>
>On Sep 25, 2014, at 5:07 AM, Koji Ishii <kojiishi@gluesoft.co.jp> wrote:
>
>> Maybe this was already discussed, but I’m in favor of a distributed ID
>>system than a single, central system.
>> 
>> Take DNS. Or Java namespace. Their prefix comes from domain names
>>authors own, which is unique, then authors can define whatever the rest.
>>If a publisher wants to use ISBN, they could use, for instance,
>><epub://isbn-international.org/123456789>.
>> 
>> Since what we want is to identify publications, as long as authors or
>>publications agree to use consistent domains/postfixes, I guess we can
>>guarantee the uniqueness.
>> 
>> Maybe there are more use cases for the ID more than identifying
>>publications? Use cases I have in mind are for links between
>>publications and OA, these I think distributed system can do.
>> 
>> /koji
>> 
>> On Sep 25, 2014, at 12:51 PM, Ivan Herman <ivan@w3.org> wrote:
>> 
>>> 
>>> On 24 Sep 2014, at 23:14 , Laura Dawson <Laura.Dawson@bowker.com>
>>>wrote:
>>> 
>>>> True. It’s a cluttered road.
>>> 
>>> We are in a really dangerous business!
>>> 
>>> Ivan
>>> 
>>>> 
>>>> On 9/24/14, 5:12 PM, "David (Standards) Singer" <singer@apple.com>
>>>>wrote:
>>>> 
>>>>> 
>>>>> On Sep 24, 2014, at 12:16 , LAURA DAWSON <ljndawson@gmail.com> wrote:
>>>>> 
>>>>>> Yes, Bowker were a DOI registration agency and I can tell you that
>>>>>>the
>>>>>> associated systems and metadata were the primary reason DOIs for
>>>>>>trade
>>>>>> books (as opposed to STEM/scholarly) never took off.
>>>>>> 
>>>>>> So you see, Ivan, the road to book URIs is littered with a couple of
>>>>>> corpses.
>>>>> 
>>>>> It’s not just books.  I was on a project that needed something for
>>>>> recordings many years ago, and that road was also strewn with
>>>>>corpses.
>>>>> 
>>>>>> 
>>>>>> On 9/24/14, 3:13 PM, "Bill Kasdorf" <bkasdorf@apexcovantage.com>
>>>>>>wrote:
>>>>>> 
>>>>>>> Actually, the DOI _is_ used for this, mainly by scholarly/STM
>>>>>>> publishers,
>>>>>>> as well as for chapters of books--typically one DOI for the book
>>>>>>>and a
>>>>>>> DOI for each chapter (and sometimes DOIs at even lower component
>>>>>>> levels,
>>>>>>> most often for figures and tables). And these are _agnostic_ as to
>>>>>>> format, they typically mean "the book" and "the chapter" in the
>>>>>>> abstract
>>>>>>> sense. When you click on one of these DOIs you are usually then
>>>>>>>given
>>>>>>> your choice of what format, whether you have access, how to obtain
>>>>>>> access, etc.
>>>>>>> 
>>>>>>> But it requires the associated systems, metadata, registration
>>>>>>>agency,
>>>>>>> etc. to make it work. To belabor a point, though, in that context
>>>>>>>it
>>>>>>> does
>>>>>>> work. There are a gazillion of them. The whole scholarly/STM
>>>>>>>ecosystem
>>>>>>> is
>>>>>>> now dependent on DOIs.
>>>>>>> 
>>>>>>> Those that use the DOI for this use CrossRef DOIs, which _should_
>>>>>>>be
>>>>>>> expressed as URIs (and increasingly are).
>>>>>>> 
>>>>>>> But all that is purely under the control of the publisher
>>>>>>>(including
>>>>>>> what
>>>>>>> the DOI links to and what that destination provides--not
>>>>>>>necessarily
>>>>>>> the
>>>>>>> content itself); it doesn't address "work" in the way librarians
>>>>>>>mean
>>>>>>> "work," and it requires the systems I mentioned (including the
>>>>>>>Handle
>>>>>>> system on which DOI is based). It would not work for our need to
>>>>>>>point
>>>>>>> to
>>>>>>> the "work itself" or some component of the work. So the answer in a
>>>>>>> purely standard web-world sense is still no.
>>>>>>> 
>>>>>>> --Bill K
>>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: Laura Dawson [mailto:Laura.Dawson@bowker.com]
>>>>>>> Sent: Wednesday, September 24, 2014 2:55 PM
>>>>>>> To: Ivan Herman; Graham Bell
>>>>>>> Cc: Laura Dawson; Phil Madans; Bill Kasdorf; W3C Public Digital
>>>>>>> Publishing IG Mailing List
>>>>>>> Subject: Re: As an aside, a possibly interesting read....
>>>>>>> 
>>>>>>> As it stands now, no. So a book's "home" on the web (regardless of
>>>>>>> edition) is not standardizable at this point unless you want to go
>>>>>>>down
>>>>>>> the DOI road (please let's not go down the DOI road).
>>>>>>> 
>>>>>>> On 9/24/14, 4:13 AM, "Ivan Herman" <ivan@w3.org> wrote:
>>>>>>> 
>>>>>>>> Thanks for all the interesting discussion...
>>>>>>>> 
>>>>>>>> However: all this is to say that there does not seem to be any
>>>>>>>> existing
>>>>>>>> (and viable) option to uniquely identify (preferably through a
>>>>>>>>URI) a
>>>>>>>> 'work' (whether in the ISTC or the FRBR sense). Which is a
>>>>>>>>problem for
>>>>>>>> metadata as well as for archiving. :-( Tell me I am wrong,
>>>>>>>>please...
>>>>>>>> 
>>>>>>>> Ivan
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On 24 Sep 2014, at 24:19 , Graham Bell <graham@editeur.org> wrote:
>>>>>>>> 
>>>>>>>>> And they can be treated this way in ONIX too. As I said,
>>>>>>>>> 
>>>>>>>>>> they are not (strictly) an attribute of the ISBN, though they
>>>>>>>>>>may be
>>>>>>>>>> presented as such in various systems
>>>>>>>>> 
>>>>>>>>> G
>>>>>>>>> 
>>>>>>>>> NB repeatable because the ISBN is associated directly with only
>>>>>>>>>one
>>>>>>>>> work, but can be indirectly associated (through that work) with
>>>>>>>>> several other works.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On 23 Sep 2014, at 21:12, LAURA DAWSON wrote:
>>>>>>>>> 
>>>>>>>>>> Yes, even at Bowker we made them a repeatable attribute on the
>>>>>>>>>>ISBN
>>>>>>>>>> record.
>>>>>>>>>> 
>>>>>>>>>> From: "Madans, Phil" <Phil.Madans@hbgusa.com>
>>>>>>>>>> Date: Tuesday, September 23, 2014 at 3:13 PM
>>>>>>>>>> To: Laura Dawson <ljndawson@gmail.com>, Graham Bell
>>>>>>>>>> <graham@editeur.org>, Bill Kasdorf <bkasdorf@apexcovantage.com>,
>>>>>>>>>> Ivan
>>>>>>>>>> Herman <ivan@w3.org>, W3C Public Digital Publishing IG Mailing
>>>>>>>>>>List
>>>>>>>>>> <public-digipub-ig-comment@w3.org>
>>>>>>>>>> Subject: Re: As an aside, a possibly interesting read....
>>>>>>>>>> 
>>>>>>>>>> I stand corrected on the assignment of the ISTC. Bad choice of
>>>>>>>>>> words.
>>>>>>>>>> I was speaking more on how I would have to manage them
>>>>>>>>>>internally on
>>>>>>>>>> the systems level―that's how I think about these things―and that
>>>>>>>>>> would be as an attribute.  That  all depends on how titles
>>>>>>>>>>systems
>>>>>>>>>> are structured, and I'm not saying ours is the best way to do
>>>>>>>>>> things,
>>>>>>>>>> but I think the way we do it is how most do it these days. From
>>>>>>>>>>a
>>>>>>>>>> practical standpoint, I'm not sure how else I could handle
>>>>>>>>>>them. IF
>>>>>>>>>> I
>>>>>>>>>> publish an English and Spanish edition of a work, and the
>>>>>>>>>>ISTC's are
>>>>>>>>>> different, then they would be attributes of the ISBNs so that I
>>>>>>>>>> could
>>>>>>>>>> keep them linked internally.  We are already doing this, as is
>>>>>>>>>>most
>>>>>>>>>> everyone else, and I think that is why the ISTC was such a hard
>>>>>>>>>> sell.
>>>>>>>>>> 
>>>>>>>>>> ------------------------------------------------------------
>>>>>>>>>> Phil Madans | Executive Director of Digital Publishing
>>>>>>>>>>Technology |
>>>>>>>>>> Hachette Book Group | 237 Park Avenue NY 10017 |212-364-1415 |
>>>>>>>>>> phil.madans@hbgusa.com
>>>>>>>>>> 
>>>>>>>>>> From: LAURA DAWSON <ljndawson@gmail.com>
>>>>>>>>>> Date: Tuesday, September 23, 2014 at 2:22 PM
>>>>>>>>>> To: Graham Bell <graham@editeur.org>, Phil Madans
>>>>>>>>>> <phil.madans@hbgusa.com>, Bill Kasdorf
>>>>>>>>>><bkasdorf@apexcovantage.com>,
>>>>>>>>>> Ivan Herman <ivan@w3.org>, W3C Public Digital Publishing IG
>>>>>>>>>>Mailing
>>>>>>>>>> List <public-digipub-ig-comment@w3.org>
>>>>>>>>>> Subject: Re: As an aside, a possibly interesting read....
>>>>>>>>>> 
>>>>>>>>>> Bowker was an ISTC registration agency until recently. We
>>>>>>>>>>pulled out
>>>>>>>>>> because of the lack of support in the US, and refer the few
>>>>>>>>>>curious
>>>>>>>>>> to Nielsen.
>>>>>>>>>> 
>>>>>>>>>> From: Graham Bell <graham@editeur.org>
>>>>>>>>>> Date: Tuesday, September 23, 2014 at 2:09 PM
>>>>>>>>>> To: Phil Madans <Phil.Madans@hbgusa.com>, Laura Dawson
>>>>>>>>>> <ljndawson@gmail.com>, Bill Kasdorf
>>>>>>>>>><bkasdorf@apexcovantage.com>,
>>>>>>>>>> Ivan Herman <ivan@w3.org>, W3C Public Digital Publishing IG
>>>>>>>>>>Mailing
>>>>>>>>>> List <public-digipub-ig-comment@w3.org>
>>>>>>>>>> Subject: Re: As an aside, a possibly interesting read....
>>>>>>>>>> 
>>>>>>>>>> What Phil and Laura have written certainly summarises -- and
>>>>>>>>>> illustrates -- the debate over identifiers.
>>>>>>>>>> 
>>>>>>>>>> But the text below (from Phil) is a little misleading.
>>>>>>>>>> 
>>>>>>>>>>> Whether an ISTC
>>>>>>>>>>> is a real work Identifier or not is a matter of debate. I
>>>>>>>>>>>disagree
>>>>>>>>>>> that ii  is. It is actually an attribute of the ISBN―-hat is
>>>>>>>>>>>how
>>>>>>>>>>> they are assigned.
>>>>>>>>>>> Different ISBNs of the same master content might have different
>>>>>>>>>>> ISTC's.
>>>>>>>>>>> Translations for instance.
>>>>>>>>>> 
>>>>>>>>>> The 'rules' of the ISTC say that translations are by definition
>>>>>>>>>> different works, and MUST have different ISTCs (though those
>>>>>>>>>>ISTCs
>>>>>>>>>> will be related to each other -- one is a 'derived work', and
>>>>>>>>>>this
>>>>>>>>>> close relationship is recorded in the registration metadata for
>>>>>>>>>>the
>>>>>>>>>> ISTCs themselves). This contrasts with library practice, where
>>>>>>>>>> 'work'
>>>>>>>>>> is something at a higher level and two translations are actually
>>>>>>>>>> termed two 'expressions' of the same 'work'. In library terms,
>>>>>>>>>>the
>>>>>>>>>> ISTC is an expression identifier. See the attached PDF (a slide
>>>>>>>>>>from
>>>>>>>>>> a training session that I deliver fairly regularly) for a
>>>>>>>>>>summary of
>>>>>>>>>> how the <indecs> model on which ISTC and ONIX are based compares
>>>>>>>>>> with
>>>>>>>>>> the FRBR library model. There is -- as far as I know -- no
>>>>>>>>>>public
>>>>>>>>>> identifier that works at the FRBR:work level, though libraries
>>>>>>>>>>may
>>>>>>>>>> have internal IDs.
>>>>>>>>>> 
>>>>>>>>>> And I'm pretty sure ISTCs can be assigned without an ISBN (and
>>>>>>>>>> without any product ID at all, in fact) -- they are not
>>>>>>>>>>(strictly)
>>>>>>>>>> an
>>>>>>>>>> attribute of the ISBN, though they may be presented as such in
>>>>>>>>>> various
>>>>>>>>>> systems.
>>>>>>>>>> They can be registered based on a manuscript, prior to there
>>>>>>>>>>being a
>>>>>>>>>> product.
>>>>>>>>>> 
>>>>>>>>>> On the other hand, there's no doubt that ISTC has so far proved
>>>>>>>>>> unpopular among publishers, for some of the reasons Laura and
>>>>>>>>>>Phil
>>>>>>>>>> list, and its actual usage is minimal.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Graham
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Graham Bell
>>>>>>>>>> EDItEUR
>>>>>>>>>> 
>>>>>>>>>> Tel: +44 20 7503 6418
>>>>>>>>>> Mob: +44 7887 754958
>>>>>>>>>> 
>>>>>>>>>> EDItEUR Limited is a company limited by guarantee, registered in
>>>>>>>>>> England no 2994705. Registered Office: United House, North Road,
>>>>>>>>>> London
>>>>>>>>>> N7 9DP, UK. Website: http://www.editeur.org
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> This may contain confidential material. If you are not an
>>>>>>>>>>intended
>>>>>>>>>> recipient, please notify the sender, delete immediately, and
>>>>>>>>>> understand that no disclosure or reliance on the information
>>>>>>>>>>herein
>>>>>>>>>> is
>>>>>>>>>> permitted.
>>>>>>>>>> Hachette Book Group may monitor email to and from our network.
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> ----
>>>>>>>> Ivan Herman, W3C
>>>>>>>> Digital Publishing Activity Lead
>>>>>>>> Home: http://www.w3.org/People/Ivan/
>>>>>>>> mobile: +31-641044153
>>>>>>>> GPG: 0x343F1A3D
>>>>>>>> WebID: http://www.ivan-herman.net/foaf#me
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> David Singer
>>>>> Manager, Software Standards, Apple Inc.
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> ----
>>> Ivan Herman, W3C
>>> Digital Publishing Activity Lead
>>> Home: http://www.w3.org/People/Ivan/
>>> mobile: +31-641044153
>>> GPG: 0x343F1A3D
>>> WebID: http://www.ivan-herman.net/foaf#me
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> 
>
Received on Thursday, 25 September 2014 13:01:49 UTC