Re: As an aside, a possibly interesting read....

Hi David

Ultimately this 'promise' grows from the governance built into the standard, which builds trust, and from the minimum amount of metadata that must be associated with each ID, which Todd mentioned earlier in the thread...

The real power is in the associated metadata related to that identifier.

So if we look at the ISBN, for example, there is a minimum set of metadata elements that is supposed to be collected by the various national ISBN agencies (not by the International ISBN Agency -- there is no central registry). The set of metadata elements defined within the ISBN standard essentially sets out the 'promise', or the scope within which the ID is unique. If any part of the metadata is different, the ID is different. If all elements of metadata are identical, the ID should be identical too.

So different editions (3rd ed, 4th ed) get different ISBNs because 'edition number' is part of that minimum metadata set. Different bindings (hb, pb) get different ISBNs because the binding is part of that minimum metadata set. Different covers on otherwise identical paperbacks don't always get different ISBNs because the cover image is not part of that minimum metadata set (though for practical stock control purposes, publishers may well assign different ISBNs anyway).

Now looked at from this perspective, the ID itself is not the important part of the discussion -- it is the metadata that is the key, and an ID is simply a shorthand (or a link, or a hash -- pick your terminology) for one particular set of values for that minimum set of metadata elements.

Identifier schemes are characterised by their minimum set of metadata elements, and the choices made when defining that set are guided by the purpose of (or use cases for) the identifier -- the functionality the ID is designed to support. ISBN was designed for the book supply chain (originally the physical supply chain, though it mostly works for digital too), and all items with the same ISBN should be functionally identical for the purposes of the book supply chain (but not necessarily identical for other functions). If there are three unsold copies with the same ISBN sitting on a shelf, it does not matter which particular one of the three you purchase.

But the ISBN is not the solution to every problem -- it doesn't help much with rights trading (which by and large operates at indecs / ISTC work level, or the FRBR expression level), it doesn't solve your problems if you are in a publisher's reprint department (because reprints use the same ISBN), and it doesn't solve all the issues in libraries (which is why they use accession numbers to identify individual copies of books, for example).

Graham


Graham Bell
EDItEUR

Tel: +44 20 7503 6418


EDItEUR Limited is a company limited by guarantee, registered in England no 2994705. Registered Office: United House, North Road, London N7 9DP, UK. Website: http://www.editeur.org






On 25 Sep 2014, at 18:44, David (Standards) Singer wrote:

I am wondering whether we have historically focused on the wrong question, notably “is the ID unique?”.  Of the projects I know about, I think too little time was spent on what the ‘promise’ was, and hence ‘unique in what sense?’.

Looking at a specific example, say I have a scheme to give IDs to physical books. If I re-publish the exact same text but with a different page or font size, so the pagination is different, does that get the same ID or a different one?  Well, it must be different if you expect to be able to refer to text by page and line number — did the promise include that that would be stable?

This failure mode — the assigner thought that the promise was X, the user Y — has been the death of labeling systems. If you cannot reliably use the label for a purpose, then it may be use-less.

“Do I have this item in stock?”
“Can I refer to parts of it stably?”
and so on...

On Sep 25, 2014, at 8:04 , Bill Kasdorf <bkasdorf@apexcovantage.com<mailto:bkasdorf@apexcovantage.com>> wrote:

I also want to point out that what we really need is not just about books.

Even though there has been frequent discussion on the IG about whether we can _focus_ on books (and the consensus, which I reluctantly went along with, is yes), for something this fundamental we really need to think in terms of a _publication_ or even a _resource_.

Even in traditionally book-dominated sectors like educational publishing, there is a rapid movement away from the concept of a "book" at all. Professors increasingly are willing to let students use any of a range of "textbooks" as a resource for, say, calculus or microbiology, as long as they are useful and have information that is relevant to the course. Increasingly those "books" themselves are being deconstructed, and more importantly most big educational publishers are moving toward a vision in which they develop resources first and books (or parts of books) are just one of many ways of associating, combining, and distributing those resources. And that is done in the context of _all the other stuff out there_ (mostly but not exclusively on the Web).

All that stuff has to be able to be identified, cited, annotated, etc. etc.

I could have written that description just as well in the context of magazines, for which _exactly the same dynamic_ is happening. Right now.

Same for scholarly/STM publishing (where publishing _data_--and citing datasets--is a very live issue). And even in the humanities, where "Digital Humanities" is becoming mainstream (and which is about "works" in the FRBR sense).

And think of all the resources needed in corporate publishing, training, etc.

All of that is "publishing." No publication exists in a closed system. It may think it is in a walled garden but there is a giant jungle outside its walls.

I really think in the pursuit of this identifier issue we MUST take the broadest possible vision or we will come up with something that is useful in one sector (perhaps) but not truly interoperable in the publishing ecosystem and the web in general (the context in which the publishing ecosystem increasingly lives and works) and will thus ultimately prove inadequate.

This is not to replace domain-specific or purpose-built identifiers like the DOI, the ISBN, etc.--those that, as Todd and others pointed out, have metadata and systems associated with them to DO THINGS. Any identifier we come up with should not make those obsolete and ideally should not conflict with them at all. It should make them more interoperable and more useful. This is not a Battle of Identifiers, and those who think One and Only One Identifier is the goal are mistaken. Many identifiers are needed because we need to do many different things with them.

But the identifier we are looking for here--enabling annotation and a myriad other related things on the Web (citation, previews, chunking, etc.)--needs to be radically widely applicable, completely agnostic as to the type of publication or resource it identifies, the format in which that publication or resource is disseminated, and yet durable, persistent, and reliable across formats and across time.

--Bill Kasdorf

-----Original Message-----
From: Laura Dawson [mailto:Laura.Dawson@bowker.com]
Sent: Thursday, September 25, 2014 9:01 AM
To: Todd Carpenter (Gmail); Koji Ishii
Cc: Ivan Herman; David (Standards) Singer; Laura Dawson; Bill Kasdorf; Graham Bell; Phil Madans; W3C Public Digital Publishing IG Mailing List
Subject: Re: As an aside, a possibly interesting read....

Todd, I think you're absolutely right about the difference between librarianship and the trade. It has been the function of libraries to archive, curate, and canonize information since their inception. Trade is about one thing and one thing only - sales. In building infrastructure, we need to support both. What both have in common is a need for effective discovery - directing a reader to the book they want. So much of the metadata will be shared in common - that which describes the book; the metadata describing the terms by which a reader may have it will differ depending on.well, the terms - the environment in which the reader is discovering the book.

That all said, I can envision a world where - for the purposes of curation and archiving - there exists a "canonical" version of a book at a URI that could well consist of the ISBN for that book (as Koji described), but if you want to own the book, you are directed to whichever platforms support it, and you choose which one you want to read on. But that presupposes an authority to govern that system. I would say the ISBN-International Agency could be that authority, but there is one important issue that prevents that - no publisher is required to report back to ISBN-IA which ISBNs get assigned to which books. ISBNs are issued in blocks - and in the case of larger publishers, many never see the light of day. ISBN-IA does not maintain a database of the ISBNs that get assigned - that is down to the registration agencies (such as Bowker, Nielsen, national libraries). And the publishers don't always report back to the RA's which numbers they are assigning to which things.

Also to be considered - in a world of self-publishing, ISBNs frequently are not assigned at all. Books are available in proprietary systems only (Kindle), and not easily discoverable. Amazon is said to be publishing about 2000 of these per week. We have no idea what they are, if they are books or "shorts", fiction, memoir, cookbooks - only Amazon has that data, and the data is provided by author/publishers who are not necessarily familiar with metadata conventions and effective description.

So, to be succinct, whether distributed or centralized, we need to break down the specific problems based on audience and the pain we're trying to solve. Probably won't be a single solution.

On 9/25/14, 2:58 AM, "Todd Carpenter (Gmail)" <tcarpenter@niso.org<mailto:tcarpenter@niso.org>> wrote:

There is a tremendous problem with distributed systems when it comes to
canonical information and standard identifiers.  That being the
metadata that is associated with that identifier.  An identifier is (or
better put should be) just a dumb (i.e., without embedded meaning),
unique set of string of characters. The structure of that string, while
systematically important is beside the point. Whether an identifier is
expressed as a 16-digit string, or as an URI or anything else is not finally the point.

The real power is in the associated metadata related to that identifier.
While there is tremendous overhead in a centralized system, they are
critically important in a well-functioning ID system. Without a
controlling system, then there will be no standard set of associated
metadata.  Now, how well that metadata is created, managed, curated and
controlled are open questions (as Laura certainly knows), but without
some authority driving compliance than inevitably there will be an
increasing divergence of metadata quality, practice and interoperability.


Also to Ivan's question about work-level IDs, there is work being done
by OCLC to develop a true FRBR Work-level identifier based on their
data store of library's bibliographic data. This ID is derived by
analysis of the collection once the items are released then catalogued.
I am not certain that a similar level work ID would be possible in
trade, outside of being done by the author, agent or rights manager to
truly combine all of the works (in a FRBR sense) under a single ID.
Identifying say, the hardcover book of a story, comic book version of
that same story, the blue-ray DVD of that story, the broadway play of
that story, and the swedish translation of the book into a single
Work-level ID is only something that can be done after the fact,
because their expressions are very, very different. The closest that we
might come to identifying that pre-production is to ID the rights
associated with a particular intellectual property. And while it may be
useful in practice, I don't know it would be useful in application.
Which, I expect in the end would only serve the purpose of making lots of IP lawyers very wealthy.

Todd




On Sep 25, 2014, at 5:07 AM, Koji Ishii <kojiishi@gluesoft.co.jp<mailto:kojiishi@gluesoft.co.jp>> wrote:

Maybe this was already discussed, but I'm in favor of a distributed
ID system than a single, central system.

Take DNS. Or Java namespace. Their prefix comes from domain names
authors own, which is unique, then authors can define whatever the rest.
If a publisher wants to use ISBN, they could use, for instance,
<epub://isbn-international.org/123456789>.

Since what we want is to identify publications, as long as authors or
publications agree to use consistent domains/postfixes, I guess we can
guarantee the uniqueness.

Maybe there are more use cases for the ID more than identifying
publications? Use cases I have in mind are for links between
publications and OA, these I think distributed system can do.

/koji

On Sep 25, 2014, at 12:51 PM, Ivan Herman <ivan@w3.org<mailto:ivan@w3.org>> wrote:


On 24 Sep 2014, at 23:14 , Laura Dawson <Laura.Dawson@bowker.com<mailto:Laura.Dawson@bowker.com>>
wrote:

True. It's a cluttered road.

We are in a really dangerous business!

Ivan


On 9/24/14, 5:12 PM, "David (Standards) Singer" <singer@apple.com<mailto:singer@apple.com>>
wrote:


On Sep 24, 2014, at 12:16 , LAURA DAWSON <ljndawson@gmail.com<mailto:ljndawson@gmail.com>> wrote:

Yes, Bowker were a DOI registration agency and I can tell you
that the  associated systems and metadata were the primary reason
DOIs for trade  books (as opposed to STEM/scholarly) never took
off.

So you see, Ivan, the road to book URIs is littered with a couple
of corpses.

It's not just books.  I was on a project that needed something for
recordings many years ago, and that road was also strewn with
corpses.


On 9/24/14, 3:13 PM, "Bill Kasdorf" <bkasdorf@apexcovantage.com<mailto:bkasdorf@apexcovantage.com>>
wrote:

Actually, the DOI _is_ used for this, mainly by scholarly/STM
publishers,  as well as for chapters of books--typically one DOI
for the book and a  DOI for each chapter (and sometimes DOIs at
even lower component  levels,  most often for figures and
tables). And these are _agnostic_ as to  format, they typically
mean "the book" and "the chapter" in the  abstract  sense. When
you click on one of these DOIs you are usually then given  your
choice of what format, whether you have access, how to obtain
access, etc.

But it requires the associated systems, metadata, registration
agency,  etc. to make it work. To belabor a point, though, in
that context it  does  work. There are a gazillion of them. The
whole scholarly/STM ecosystem  is  now dependent on DOIs.

Those that use the DOI for this use CrossRef DOIs, which
_should_ be  expressed as URIs (and increasingly are).

But all that is purely under the control of the publisher
(including  what  the DOI links to and what that destination
provides--not necessarily  the  content itself); it doesn't
address "work" in the way librarians mean  "work," and it
requires the systems I mentioned (including the Handle  system on
which DOI is based). It would not work for our need to point  to
the "work itself" or some component of the work. So the answer in
a  purely standard web-world sense is still no.

--Bill K

-----Original Message-----
From: Laura Dawson [mailto:Laura.Dawson@bowker.com]
Sent: Wednesday, September 24, 2014 2:55 PM
To: Ivan Herman; Graham Bell
Cc: Laura Dawson; Phil Madans; Bill Kasdorf; W3C Public Digital
Publishing IG Mailing List
Subject: Re: As an aside, a possibly interesting read....

As it stands now, no. So a book's "home" on the web (regardless
of
edition) is not standardizable at this point unless you want to
go down  the DOI road (please let's not go down the DOI road).

On 9/24/14, 4:13 AM, "Ivan Herman" <ivan@w3.org<mailto:ivan@w3.org>> wrote:

Thanks for all the interesting discussion...

However: all this is to say that there does not seem to be any
existing  (and viable) option to uniquely identify (preferably
through a
URI) a
'work' (whether in the ISTC or the FRBR sense). Which is a
problem for  metadata as well as for archiving. :-( Tell me I am
wrong, please...

Ivan


On 24 Sep 2014, at 24:19 , Graham Bell <graham@editeur.org<mailto:graham@editeur.org>> wrote:

And they can be treated this way in ONIX too. As I said,

they are not (strictly) an attribute of the ISBN, though they
may be  presented as such in various systems

G

NB repeatable because the ISBN is associated directly with
only one  work, but can be indirectly associated (through that
work) with  several other works.


On 23 Sep 2014, at 21:12, LAURA DAWSON wrote:

Yes, even at Bowker we made them a repeatable attribute on
the ISBN  record.

From: "Madans, Phil" <Phil.Madans@hbgusa.com<mailto:Phil.Madans@hbgusa.com>>
Date: Tuesday, September 23, 2014 at 3:13 PM
To: Laura Dawson <ljndawson@gmail.com<mailto:ljndawson@gmail.com>>, Graham Bell
<graham@editeur.org<mailto:graham@editeur.org>>, Bill Kasdorf
<bkasdorf@apexcovantage.com<mailto:bkasdorf@apexcovantage.com>>,  Ivan  Herman <ivan@w3.org<mailto:ivan@w3.org>>, W3C
Public Digital Publishing IG Mailing List
<public-digipub-ig-comment@w3.org<mailto:public-digipub-ig-comment@w3.org>>
Subject: Re: As an aside, a possibly interesting read....

I stand corrected on the assignment of the ISTC. Bad choice
of  words.
I was speaking more on how I would have to manage them
internally on  the systems level―that's how I think about
these things―and that  would be as an attribute.  That  all
depends on how titles systems  are structured, and I'm not
saying ours is the best way to do  things,  but I think the
way we do it is how most do it these days. From a  practical
standpoint, I'm not sure how else I could handle them. IF  I
publish an English and Spanish edition of a work, and the
ISTC's are  different, then they would be attributes of the
ISBNs so that I  could  keep them linked internally.  We are
already doing this, as is most  everyone else, and I think
that is why the ISTC was such a hard  sell.

------------------------------------------------------------
Phil Madans | Executive Director of Digital Publishing
Technology |  Hachette Book Group | 237 Park Avenue NY 10017
|212-364-1415 |  phil.madans@hbgusa.com<mailto:phil.madans@hbgusa.com>

From: LAURA DAWSON <ljndawson@gmail.com<mailto:ljndawson@gmail.com>>
Date: Tuesday, September 23, 2014 at 2:22 PM
To: Graham Bell <graham@editeur.org<mailto:graham@editeur.org>>, Phil Madans
<phil.madans@hbgusa.com<mailto:phil.madans@hbgusa.com>>, Bill Kasdorf
<bkasdorf@apexcovantage.com<mailto:bkasdorf@apexcovantage.com>>,
Ivan Herman <ivan@w3.org<mailto:ivan@w3.org>>, W3C Public Digital Publishing IG
Mailing  List <public-digipub-ig-comment@w3.org<mailto:public-digipub-ig-comment@w3.org>>
Subject: Re: As an aside, a possibly interesting read....

Bowker was an ISTC registration agency until recently. We
pulled out  because of the lack of support in the US, and
refer the few curious  to Nielsen.

From: Graham Bell <graham@editeur.org<mailto:graham@editeur.org>>
Date: Tuesday, September 23, 2014 at 2:09 PM
To: Phil Madans <Phil.Madans@hbgusa.com<mailto:Phil.Madans@hbgusa.com>>, Laura Dawson
<ljndawson@gmail.com<mailto:ljndawson@gmail.com>>, Bill Kasdorf
<bkasdorf@apexcovantage.com<mailto:bkasdorf@apexcovantage.com>>,
Ivan Herman <ivan@w3.org<mailto:ivan@w3.org>>, W3C Public Digital Publishing IG
Mailing  List <public-digipub-ig-comment@w3.org<mailto:public-digipub-ig-comment@w3.org>>
Subject: Re: As an aside, a possibly interesting read....

What Phil and Laura have written certainly summarises -- and
illustrates -- the debate over identifiers.

But the text below (from Phil) is a little misleading.

Whether an ISTC
is a real work Identifier or not is a matter of debate. I
disagree  that ii  is. It is actually an attribute of the
ISBN―-hat is how  they are assigned.
Different ISBNs of the same master content might have
different  ISTC's.
Translations for instance.

The 'rules' of the ISTC say that translations are by
definition  different works, and MUST have different ISTCs
(though those ISTCs  will be related to each other -- one is a
'derived work', and this  close relationship is recorded in
the registration metadata for the  ISTCs themselves). This
contrasts with library practice, where  'work'
is something at a higher level and two translations are
actually  termed two 'expressions' of the same 'work'. In
library terms, the  ISTC is an expression identifier. See the
attached PDF (a slide from  a training session that I deliver
fairly regularly) for a summary of  how the <indecs> model on
which ISTC and ONIX are based compares  with  the FRBR library
model. There is -- as far as I know -- no public  identifier
that works at the FRBR:work level, though libraries may  have
internal IDs.

And I'm pretty sure ISTCs can be assigned without an ISBN
(and  without any product ID at all, in fact) -- they are not
(strictly)
an
attribute of the ISBN, though they may be presented as such
in  various  systems.
They can be registered based on a manuscript, prior to there
being a  product.

On the other hand, there's no doubt that ISTC has so far
proved  unpopular among publishers, for some of the reasons
Laura and Phil  list, and its actual usage is minimal.


Graham





Graham Bell
EDItEUR

Tel: +44 20 7503 6418
Mob: +44 7887 754958

EDItEUR Limited is a company limited by guarantee, registered
in England no 2994705. Registered Office: United House, North
Road, London
N7 9DP, UK. Website: http://www.editeur.org






This may contain confidential material. If you are not an
intended  recipient, please notify the sender, delete
immediately, and  understand that no disclosure or reliance on
the information herein  is  permitted.
Hachette Book Group may monitor email to and from our network.



----
Ivan Herman, W3C
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/

mobile: +31-641044153
GPG: 0x343F1A3D
WebID: http://www.ivan-herman.net/foaf#me











David Singer
Manager, Software Standards, Apple Inc.





----
Ivan Herman, W3C
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/

mobile: +31-641044153
GPG: 0x343F1A3D
WebID: http://www.ivan-herman.net/foaf#me












David Singer
Manager, Software Standards, Apple Inc.

Received on Thursday, 25 September 2014 21:41:58 UTC