RE: Dereferencing, Re: Jotting down some discussion topics from Bill Kasdorf on 2016-09-21 (public-digipub-ig@w3.org from September 2016)

From: Bill Kasdorf <bkasdorf@apexcovantage.com>
Date: Wed, 21 Sep 2016 09:12:45 +0000
To: Mike Perlman <perlmanm@me.com>, Marcos Caceres <marcos@marcosc.com>
CC: Peter Krautzberger <peter.krautzberger@mathjax.org>, Leonard Rosenthol <lrosenth@adobe.com>, Ivan Herman <ivan@w3.org>, Michael Smith <mike@w3.org>, W3C Digital Publishing IG <public-digipub-ig@w3.org>
Message-ID: <CY1PR0601MB1422F3BE3BE2B7C6FE56E0DFDFF60@CY1PR0601MB1422.namprd06.prod.outlook.>

This is why I'm always careful to use the term "Crossref DOI" when referring to a DOI used in and by Crossref. There are lots of different types of DOIs in the world. The entertainment industry, for example, has EIDR, which is also a DOI. What needs to be kept in mind--for the DOI, and I would argue for any identifier--is that an identifier must be associated with certain metadata to be meaningful, and typically with an infrastructure that manages the use of it. That's what Crossref provides for scholarly articles, books, chapters, proceedings, etc. that have Crossref DOIs, based on consistent metadata associated with each Crossref DOI. EIDR does the same thing for films and videos, but with a different complement of metadata. The DOI itself is just a "dumb number," deliberately so. It means nothing outside of the context in which it's used. But it is _extremely_ useful in those contexts.

I should also point out that the Crossref DOI doesn't necessarily "link to" the content it identifies. It links to whatever the present "owner" of that DOI wants it to link to. That's what enables it to be persistent. A book once published by Publisher A gets a Crossref DOI, which might point to Publisher A's website, or that book's web page, or a page offering purchasing options; Publisher A is acquired by Publisher B; now Publisher B changes the URL to which that Crossref DOI points, to its own website or even to something saying "here's a bunch of places where you can buy this book" or "if you are a member of a subscribing institution you can get direct access to this book," or whatever.

The DOI isn't flaky. It does just what it was designed to do.

I will separately send a reminder that we should be careful not to confuse identifiers with locators.

Bill Kasdorf
VP and Principal Consultant | Apex CoVantage
p: 734-904-6252  m:   734-904-6252
ISNI: http://isni.org/isni/0000000116490786

ORCiD: https://orcid.org/0000-0001-7002-4786

-----Original Message-----
From: Mike Perlman [mailto:perlmanm@me.com] 
Sent: Wednesday, September 21, 2016 3:44 AM
To: Marcos Caceres
Cc: Peter Krautzberger; Leonard Rosenthol; Ivan Herman; Michael Smith; W3C Digital Publishing IG
Subject: Re: Dereferencing, Re: Jotting down some discussion topics

It looks like DOIs can also be flaky.
Maybe something like 2-factor identifiers are what’s required?

From “Scholarly Open Access” about DOI
https://scholarlyoa.com/2016/09/06/more-competition-for-crossref/

And just today
http://doai.io

> On 21 Sep 2016, at 09:23, Marcos Caceres <marcos@marcosc.com> wrote:
> 
> On September 21, 2016 at 4:30:17 PM, Leonard Rosenthol
> (lrosenth@adobe.com) wrote:
>> Also remember, Marcos, that the identifier for a PWP is _NOT_ always a URL.
> 
> I completely agree. Using URLs as identifiers is generally not a great 
> idea, because URLs are so volatile - and domains can be lost, swapped, 
> abandoned, deleted. And because of the "but what will it return?"
> (dereferencing) problem, which is why I don't think we want to go 
> there... but here we are :)
> 
> Here is a real life example from one of my favorite books about HTML:
> 
> http://diveintohtml5.info/

> 
> There is a dramatic history around that book and the author (which I 
> won't go into, but it would make for a great book!), but it used to be 
> hosted at a different URL (the original author rage deleted the domain 
> along with all traces of their online persona).
> 
> The web dev community found a way to bring the book back to life 
> (thanks to its  CC-BY-3.0 license) and, IIRC, archive.org.
> 
> The book is also published in physical form as:
> https://www.amazon.com/HTML5-Up-Running-Mark-Pilgrim/dp/0596806027

> 
> With identifiers:
> ISBN-13: 978-0596806026
> ISBN-10: 0596806027
> 
> Anyway, the point is... same book, different URL. URLs can't identify 
> things and when they do, they do it badly (e.g., XML namespaces).
> 
>> It could be
>> w3id, a DOI or an ISBN. We need a term that works for all of those 
>> types of identifiers. (since we also have an “off the web” manifestation, that I know you hate).
> 
> I don't hate (sorry if I came across that way).
> 
> Because URLs are not stable, it's desirable to separate identifying 
> aspects from the protocol used in the acquisition of a publication.
> That is, http(s) the protocol to acquire a resource that self 
> identifies by a w3id, DOI, ISBN or whatever - or in the container 
> case, container contains resource(s) that together form publication 
> identified by w3id, a DOI, or an ISBN.
>

Received on Wednesday, 21 September 2016 09:13:18 UTC