Re: As an aside, a possibly interesting read.... from David (Standards) Singer on 2014-09-25 (public-digipub@w3.org from September 2014)

From: David (Standards) Singer <singer@apple.com>
Date: Thu, 25 Sep 2014 09:44:09 -0700
To: Bill Kasdorf <bkasdorf@apexcovantage.com>
Cc: Laura Dawson <Laura.Dawson@bowker.com>, "Todd Carpenter (Gmail)" <tcarpenter@niso.org>, Koji Ishii <kojiishi@gluesoft.co.jp>, Ivan Herman <ivan@w3.org>, Laura Dawson <ljndawson@gmail.com>, Graham Bell <graham@editeur.org>, Phil Madans <Phil.Madans@hbgusa.com>, W3C Public Digital Publishing IG Mailing List <public-digipub-ig-comment@w3.org>
Message-id: <87289B33-8F73-4D59-B56A-2E879F826A75@apple.com>
I am wondering whether we have historically focused on the wrong question, notably “is the ID unique?”.  Of the projects I know about, I think too little time was spent on what the ‘promise’ was, and hence ‘unique in what sense?’.

Looking at a specific example, say I have a scheme to give IDs to physical books. If I re-publish the exact same text but with a different page or font size, so the pagination is different, does that get the same ID or a different one?  Well, it must be different if you expect to be able to refer to text by page and line number — did the promise include that that would be stable?

This failure mode — the assigner thought that the promise was X, the user Y — has been the death of labeling systems. If you cannot reliably use the label for a purpose, then it may be use-less.

“Do I have this item in stock?”
“Can I refer to parts of it stably?”
and so on...

On Sep 25, 2014, at 8:04 , Bill Kasdorf <bkasdorf@apexcovantage.com> wrote:

> I also want to point out that what we really need is not just about books.
> 
> Even though there has been frequent discussion on the IG about whether we can _focus_ on books (and the consensus, which I reluctantly went along with, is yes), for something this fundamental we really need to think in terms of a _publication_ or even a _resource_.
> 
> Even in traditionally book-dominated sectors like educational publishing, there is a rapid movement away from the concept of a "book" at all. Professors increasingly are willing to let students use any of a range of "textbooks" as a resource for, say, calculus or microbiology, as long as they are useful and have information that is relevant to the course. Increasingly those "books" themselves are being deconstructed, and more importantly most big educational publishers are moving toward a vision in which they develop resources first and books (or parts of books) are just one of many ways of associating, combining, and distributing those resources. And that is done in the context of _all the other stuff out there_ (mostly but not exclusively on the Web).
> 
> All that stuff has to be able to be identified, cited, annotated, etc. etc.
> 
> I could have written that description just as well in the context of magazines, for which _exactly the same dynamic_ is happening. Right now.
> 
> Same for scholarly/STM publishing (where publishing _data_--and citing datasets--is a very live issue). And even in the humanities, where "Digital Humanities" is becoming mainstream (and which is about "works" in the FRBR sense).
> 
> And think of all the resources needed in corporate publishing, training, etc.
> 
> All of that is "publishing." No publication exists in a closed system. It may think it is in a walled garden but there is a giant jungle outside its walls.
> 
> I really think in the pursuit of this identifier issue we MUST take the broadest possible vision or we will come up with something that is useful in one sector (perhaps) but not truly interoperable in the publishing ecosystem and the web in general (the context in which the publishing ecosystem increasingly lives and works) and will thus ultimately prove inadequate.
> 
> This is not to replace domain-specific or purpose-built identifiers like the DOI, the ISBN, etc.--those that, as Todd and others pointed out, have metadata and systems associated with them to DO THINGS. Any identifier we come up with should not make those obsolete and ideally should not conflict with them at all. It should make them more interoperable and more useful. This is not a Battle of Identifiers, and those who think One and Only One Identifier is the goal are mistaken. Many identifiers are needed because we need to do many different things with them.
> 
> But the identifier we are looking for here--enabling annotation and a myriad other related things on the Web (citation, previews, chunking, etc.)--needs to be radically widely applicable, completely agnostic as to the type of publication or resource it identifies, the format in which that publication or resource is disseminated, and yet durable, persistent, and reliable across formats and across time.
> 
> --Bill Kasdorf
> 
> -----Original Message-----
> From: Laura Dawson [mailto:Laura.Dawson@bowker.com] 
> Sent: Thursday, September 25, 2014 9:01 AM
> To: Todd Carpenter (Gmail); Koji Ishii
> Cc: Ivan Herman; David (Standards) Singer; Laura Dawson; Bill Kasdorf; Graham Bell; Phil Madans; W3C Public Digital Publishing IG Mailing List
> Subject: Re: As an aside, a possibly interesting read....
> 
> Todd, I think you're absolutely right about the difference between librarianship and the trade. It has been the function of libraries to archive, curate, and canonize information since their inception. Trade is about one thing and one thing only - sales. In building infrastructure, we need to support both. What both have in common is a need for effective discovery - directing a reader to the book they want. So much of the metadata will be shared in common - that which describes the book; the metadata describing the terms by which a reader may have it will differ depending on.well, the terms - the environment in which the reader is discovering the book.
> 
> That all said, I can envision a world where - for the purposes of curation and archiving - there exists a "canonical" version of a book at a URI that could well consist of the ISBN for that book (as Koji described), but if you want to own the book, you are directed to whichever platforms support it, and you choose which one you want to read on. But that presupposes an authority to govern that system. I would say the ISBN-International Agency could be that authority, but there is one important issue that prevents that - no publisher is required to report back to ISBN-IA which ISBNs get assigned to which books. ISBNs are issued in blocks - and in the case of larger publishers, many never see the light of day. ISBN-IA does not maintain a database of the ISBNs that get assigned - that is down to the registration agencies (such as Bowker, Nielsen, national libraries). And the publishers don't always report back to the RA's which numbers they are assigning to which things.
> 
> Also to be considered - in a world of self-publishing, ISBNs frequently are not assigned at all. Books are available in proprietary systems only (Kindle), and not easily discoverable. Amazon is said to be publishing about 2000 of these per week. We have no idea what they are, if they are books or "shorts", fiction, memoir, cookbooks - only Amazon has that data, and the data is provided by author/publishers who are not necessarily familiar with metadata conventions and effective description.
> 
> So, to be succinct, whether distributed or centralized, we need to break down the specific problems based on audience and the pain we're trying to solve. Probably won't be a single solution.
> 
> On 9/25/14, 2:58 AM, "Todd Carpenter (Gmail)" <tcarpenter@niso.org> wrote:
> 
>> There is a tremendous problem with distributed systems when it comes to 
>> canonical information and standard identifiers.  That being the 
>> metadata that is associated with that identifier.  An identifier is (or 
>> better put should be) just a dumb (i.e., without embedded meaning), 
>> unique set of string of characters. The structure of that string, while 
>> systematically important is beside the point. Whether an identifier is 
>> expressed as a 16-digit string, or as an URI or anything else is not finally the point.
>> 
>> The real power is in the associated metadata related to that identifier.
>> While there is tremendous overhead in a centralized system, they are 
>> critically important in a well-functioning ID system. Without a 
>> controlling system, then there will be no standard set of associated 
>> metadata.  Now, how well that metadata is created, managed, curated and 
>> controlled are open questions (as Laura certainly knows), but without 
>> some authority driving compliance than inevitably there will be an 
>> increasing divergence of metadata quality, practice and interoperability.
>> 
>> 
>> Also to Ivan's question about work-level IDs, there is work being done 
>> by OCLC to develop a true FRBR Work-level identifier based on their 
>> data store of library's bibliographic data. This ID is derived by 
>> analysis of the collection once the items are released then catalogued. 
>> I am not certain that a similar level work ID would be possible in 
>> trade, outside of being done by the author, agent or rights manager to 
>> truly combine all of the works (in a FRBR sense) under a single ID.  
>> Identifying say, the hardcover book of a story, comic book version of 
>> that same story, the blue-ray DVD of that story, the broadway play of 
>> that story, and the swedish translation of the book into a single 
>> Work-level ID is only something that can be done after the fact, 
>> because their expressions are very, very different. The closest that we 
>> might come to identifying that pre-production is to ID the rights 
>> associated with a particular intellectual property. And while it may be 
>> useful in practice, I don't know it would be useful in application. 
>> Which, I expect in the end would only serve the purpose of making lots of IP lawyers very wealthy.
>> 
>> Todd
>> 
>> 
>> 
>> 
>> On Sep 25, 2014, at 5:07 AM, Koji Ishii <kojiishi@gluesoft.co.jp> wrote:
>> 
>>> Maybe this was already discussed, but I'm in favor of a distributed 
>>> ID system than a single, central system.
>>> 
>>> Take DNS. Or Java namespace. Their prefix comes from domain names 
>>> authors own, which is unique, then authors can define whatever the rest.
>>> If a publisher wants to use ISBN, they could use, for instance, 
>>> <epub://isbn-international.org/123456789>.
>>> 
>>> Since what we want is to identify publications, as long as authors or 
>>> publications agree to use consistent domains/postfixes, I guess we can 
>>> guarantee the uniqueness.
>>> 
>>> Maybe there are more use cases for the ID more than identifying 
>>> publications? Use cases I have in mind are for links between 
>>> publications and OA, these I think distributed system can do.
>>> 
>>> /koji
>>> 
>>> On Sep 25, 2014, at 12:51 PM, Ivan Herman <ivan@w3.org> wrote:
>>> 
>>>> 
>>>> On 24 Sep 2014, at 23:14 , Laura Dawson <Laura.Dawson@bowker.com>
>>>> wrote:
>>>> 
>>>>> True. It's a cluttered road.
>>>> 
>>>> We are in a really dangerous business!
>>>> 
>>>> Ivan
>>>> 
>>>>> 
>>>>> On 9/24/14, 5:12 PM, "David (Standards) Singer" <singer@apple.com>
>>>>> wrote:
>>>>> 
>>>>>> 
>>>>>> On Sep 24, 2014, at 12:16 , LAURA DAWSON <ljndawson@gmail.com> wrote:
>>>>>> 
>>>>>>> Yes, Bowker were a DOI registration agency and I can tell you 
>>>>>>> that the  associated systems and metadata were the primary reason 
>>>>>>> DOIs for trade  books (as opposed to STEM/scholarly) never took 
>>>>>>> off.
>>>>>>> 
>>>>>>> So you see, Ivan, the road to book URIs is littered with a couple 
>>>>>>> of corpses.
>>>>>> 
>>>>>> It's not just books.  I was on a project that needed something for  
>>>>>> recordings many years ago, and that road was also strewn with 
>>>>>> corpses.
>>>>>> 
>>>>>>> 
>>>>>>> On 9/24/14, 3:13 PM, "Bill Kasdorf" <bkasdorf@apexcovantage.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Actually, the DOI _is_ used for this, mainly by scholarly/STM  
>>>>>>>> publishers,  as well as for chapters of books--typically one DOI 
>>>>>>>> for the book and a  DOI for each chapter (and sometimes DOIs at 
>>>>>>>> even lower component  levels,  most often for figures and 
>>>>>>>> tables). And these are _agnostic_ as to  format, they typically 
>>>>>>>> mean "the book" and "the chapter" in the  abstract  sense. When 
>>>>>>>> you click on one of these DOIs you are usually then given  your 
>>>>>>>> choice of what format, whether you have access, how to obtain  
>>>>>>>> access, etc.
>>>>>>>> 
>>>>>>>> But it requires the associated systems, metadata, registration 
>>>>>>>> agency,  etc. to make it work. To belabor a point, though, in 
>>>>>>>> that context it  does  work. There are a gazillion of them. The 
>>>>>>>> whole scholarly/STM ecosystem  is  now dependent on DOIs.
>>>>>>>> 
>>>>>>>> Those that use the DOI for this use CrossRef DOIs, which 
>>>>>>>> _should_ be  expressed as URIs (and increasingly are).
>>>>>>>> 
>>>>>>>> But all that is purely under the control of the publisher 
>>>>>>>> (including  what  the DOI links to and what that destination 
>>>>>>>> provides--not necessarily  the  content itself); it doesn't 
>>>>>>>> address "work" in the way librarians mean  "work," and it 
>>>>>>>> requires the systems I mentioned (including the Handle  system on 
>>>>>>>> which DOI is based). It would not work for our need to point  to  
>>>>>>>> the "work itself" or some component of the work. So the answer in 
>>>>>>>> a  purely standard web-world sense is still no.
>>>>>>>> 
>>>>>>>> --Bill K
>>>>>>>> 
>>>>>>>> -----Original Message-----
>>>>>>>> From: Laura Dawson [mailto:Laura.Dawson@bowker.com]
>>>>>>>> Sent: Wednesday, September 24, 2014 2:55 PM
>>>>>>>> To: Ivan Herman; Graham Bell
>>>>>>>> Cc: Laura Dawson; Phil Madans; Bill Kasdorf; W3C Public Digital 
>>>>>>>> Publishing IG Mailing List
>>>>>>>> Subject: Re: As an aside, a possibly interesting read....
>>>>>>>> 
>>>>>>>> As it stands now, no. So a book's "home" on the web (regardless 
>>>>>>>> of
>>>>>>>> edition) is not standardizable at this point unless you want to 
>>>>>>>> go down  the DOI road (please let's not go down the DOI road).
>>>>>>>> 
>>>>>>>> On 9/24/14, 4:13 AM, "Ivan Herman" <ivan@w3.org> wrote:
>>>>>>>> 
>>>>>>>>> Thanks for all the interesting discussion...
>>>>>>>>> 
>>>>>>>>> However: all this is to say that there does not seem to be any  
>>>>>>>>> existing  (and viable) option to uniquely identify (preferably 
>>>>>>>>> through a
>>>>>>>>> URI) a
>>>>>>>>> 'work' (whether in the ISTC or the FRBR sense). Which is a 
>>>>>>>>> problem for  metadata as well as for archiving. :-( Tell me I am 
>>>>>>>>> wrong, please...
>>>>>>>>> 
>>>>>>>>> Ivan
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On 24 Sep 2014, at 24:19 , Graham Bell <graham@editeur.org> wrote:
>>>>>>>>> 
>>>>>>>>>> And they can be treated this way in ONIX too. As I said,
>>>>>>>>>> 
>>>>>>>>>>> they are not (strictly) an attribute of the ISBN, though they 
>>>>>>>>>>> may be  presented as such in various systems
>>>>>>>>>> 
>>>>>>>>>> G
>>>>>>>>>> 
>>>>>>>>>> NB repeatable because the ISBN is associated directly with 
>>>>>>>>>> only one  work, but can be indirectly associated (through that 
>>>>>>>>>> work) with  several other works.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On 23 Sep 2014, at 21:12, LAURA DAWSON wrote:
>>>>>>>>>> 
>>>>>>>>>>> Yes, even at Bowker we made them a repeatable attribute on 
>>>>>>>>>>> the ISBN  record.
>>>>>>>>>>> 
>>>>>>>>>>> From: "Madans, Phil" <Phil.Madans@hbgusa.com>
>>>>>>>>>>> Date: Tuesday, September 23, 2014 at 3:13 PM
>>>>>>>>>>> To: Laura Dawson <ljndawson@gmail.com>, Graham Bell  
>>>>>>>>>>> <graham@editeur.org>, Bill Kasdorf 
>>>>>>>>>>> <bkasdorf@apexcovantage.com>,  Ivan  Herman <ivan@w3.org>, W3C 
>>>>>>>>>>> Public Digital Publishing IG Mailing List  
>>>>>>>>>>> <public-digipub-ig-comment@w3.org>
>>>>>>>>>>> Subject: Re: As an aside, a possibly interesting read....
>>>>>>>>>>> 
>>>>>>>>>>> I stand corrected on the assignment of the ISTC. Bad choice 
>>>>>>>>>>> of  words.
>>>>>>>>>>> I was speaking more on how I would have to manage them 
>>>>>>>>>>> internally on  the systems level―that's how I think about 
>>>>>>>>>>> these things―and that  would be as an attribute.  That  all 
>>>>>>>>>>> depends on how titles systems  are structured, and I'm not 
>>>>>>>>>>> saying ours is the best way to do  things,  but I think the 
>>>>>>>>>>> way we do it is how most do it these days. From a  practical 
>>>>>>>>>>> standpoint, I'm not sure how else I could handle them. IF  I  
>>>>>>>>>>> publish an English and Spanish edition of a work, and the 
>>>>>>>>>>> ISTC's are  different, then they would be attributes of the 
>>>>>>>>>>> ISBNs so that I  could  keep them linked internally.  We are 
>>>>>>>>>>> already doing this, as is most  everyone else, and I think 
>>>>>>>>>>> that is why the ISTC was such a hard  sell.
>>>>>>>>>>> 
>>>>>>>>>>> ------------------------------------------------------------
>>>>>>>>>>> Phil Madans | Executive Director of Digital Publishing 
>>>>>>>>>>> Technology |  Hachette Book Group | 237 Park Avenue NY 10017 
>>>>>>>>>>> |212-364-1415 |  phil.madans@hbgusa.com
>>>>>>>>>>> 
>>>>>>>>>>> From: LAURA DAWSON <ljndawson@gmail.com>
>>>>>>>>>>> Date: Tuesday, September 23, 2014 at 2:22 PM
>>>>>>>>>>> To: Graham Bell <graham@editeur.org>, Phil Madans  
>>>>>>>>>>> <phil.madans@hbgusa.com>, Bill Kasdorf 
>>>>>>>>>>> <bkasdorf@apexcovantage.com>,
>>>>>>>>>>> Ivan Herman <ivan@w3.org>, W3C Public Digital Publishing IG 
>>>>>>>>>>> Mailing  List <public-digipub-ig-comment@w3.org>
>>>>>>>>>>> Subject: Re: As an aside, a possibly interesting read....
>>>>>>>>>>> 
>>>>>>>>>>> Bowker was an ISTC registration agency until recently. We 
>>>>>>>>>>> pulled out  because of the lack of support in the US, and 
>>>>>>>>>>> refer the few curious  to Nielsen.
>>>>>>>>>>> 
>>>>>>>>>>> From: Graham Bell <graham@editeur.org>
>>>>>>>>>>> Date: Tuesday, September 23, 2014 at 2:09 PM
>>>>>>>>>>> To: Phil Madans <Phil.Madans@hbgusa.com>, Laura Dawson  
>>>>>>>>>>> <ljndawson@gmail.com>, Bill Kasdorf 
>>>>>>>>>>> <bkasdorf@apexcovantage.com>,
>>>>>>>>>>> Ivan Herman <ivan@w3.org>, W3C Public Digital Publishing IG 
>>>>>>>>>>> Mailing  List <public-digipub-ig-comment@w3.org>
>>>>>>>>>>> Subject: Re: As an aside, a possibly interesting read....
>>>>>>>>>>> 
>>>>>>>>>>> What Phil and Laura have written certainly summarises -- and 
>>>>>>>>>>> illustrates -- the debate over identifiers.
>>>>>>>>>>> 
>>>>>>>>>>> But the text below (from Phil) is a little misleading.
>>>>>>>>>>> 
>>>>>>>>>>>> Whether an ISTC
>>>>>>>>>>>> is a real work Identifier or not is a matter of debate. I 
>>>>>>>>>>>> disagree  that ii  is. It is actually an attribute of the 
>>>>>>>>>>>> ISBN―-hat is how  they are assigned.
>>>>>>>>>>>> Different ISBNs of the same master content might have 
>>>>>>>>>>>> different  ISTC's.
>>>>>>>>>>>> Translations for instance.
>>>>>>>>>>> 
>>>>>>>>>>> The 'rules' of the ISTC say that translations are by 
>>>>>>>>>>> definition  different works, and MUST have different ISTCs 
>>>>>>>>>>> (though those ISTCs  will be related to each other -- one is a 
>>>>>>>>>>> 'derived work', and this  close relationship is recorded in 
>>>>>>>>>>> the registration metadata for the  ISTCs themselves). This 
>>>>>>>>>>> contrasts with library practice, where  'work'
>>>>>>>>>>> is something at a higher level and two translations are 
>>>>>>>>>>> actually  termed two 'expressions' of the same 'work'. In 
>>>>>>>>>>> library terms, the  ISTC is an expression identifier. See the 
>>>>>>>>>>> attached PDF (a slide from  a training session that I deliver 
>>>>>>>>>>> fairly regularly) for a summary of  how the <indecs> model on 
>>>>>>>>>>> which ISTC and ONIX are based compares  with  the FRBR library 
>>>>>>>>>>> model. There is -- as far as I know -- no public  identifier 
>>>>>>>>>>> that works at the FRBR:work level, though libraries may  have 
>>>>>>>>>>> internal IDs.
>>>>>>>>>>> 
>>>>>>>>>>> And I'm pretty sure ISTCs can be assigned without an ISBN 
>>>>>>>>>>> (and  without any product ID at all, in fact) -- they are not
>>>>>>>>>>> (strictly)
>>>>>>>>>>> an
>>>>>>>>>>> attribute of the ISBN, though they may be presented as such 
>>>>>>>>>>> in  various  systems.
>>>>>>>>>>> They can be registered based on a manuscript, prior to there 
>>>>>>>>>>> being a  product.
>>>>>>>>>>> 
>>>>>>>>>>> On the other hand, there's no doubt that ISTC has so far 
>>>>>>>>>>> proved  unpopular among publishers, for some of the reasons 
>>>>>>>>>>> Laura and Phil  list, and its actual usage is minimal.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Graham
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Graham Bell
>>>>>>>>>>> EDItEUR
>>>>>>>>>>> 
>>>>>>>>>>> Tel: +44 20 7503 6418
>>>>>>>>>>> Mob: +44 7887 754958
>>>>>>>>>>> 
>>>>>>>>>>> EDItEUR Limited is a company limited by guarantee, registered 
>>>>>>>>>>> in England no 2994705. Registered Office: United House, North 
>>>>>>>>>>> Road, London
>>>>>>>>>>> N7 9DP, UK. Website: http://www.editeur.org
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> This may contain confidential material. If you are not an 
>>>>>>>>>>> intended  recipient, please notify the sender, delete 
>>>>>>>>>>> immediately, and  understand that no disclosure or reliance on 
>>>>>>>>>>> the information herein  is  permitted.
>>>>>>>>>>> Hachette Book Group may monitor email to and from our network.
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> ----
>>>>>>>>> Ivan Herman, W3C
>>>>>>>>> Digital Publishing Activity Lead
>>>>>>>>> Home: http://www.w3.org/People/Ivan/
>>>>>>>>> mobile: +31-641044153
>>>>>>>>> GPG: 0x343F1A3D
>>>>>>>>> WebID: http://www.ivan-herman.net/foaf#me
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> David Singer
>>>>>> Manager, Software Standards, Apple Inc.
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> ----
>>>> Ivan Herman, W3C
>>>> Digital Publishing Activity Lead
>>>> Home: http://www.w3.org/People/Ivan/
>>>> mobile: +31-641044153
>>>> GPG: 0x343F1A3D
>>>> WebID: http://www.ivan-herman.net/foaf#me
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
> 
> 

David Singer
Manager, Software Standards, Apple Inc.
Received on Thursday, 25 September 2014 16:45:01 UTC