Re: Publishing new vocabularies - which resolver service from Phil Archer on 2016-04-22 (public-perma-id@w3.org from April 2016)

From: Phil Archer <phila@w3.org>
Date: Fri, 22 Apr 2016 13:14:10 +0100
To: Monica Omodei <monica.omodei@gmail.com>
Cc: "public-perma-id@w3.org" <public-perma-id@w3.org>
Message-ID: <571A1592.3070102@w3.org>
Hi Monica, pls see inline below - although in fairness, I must warn you, 
I have a big hobby horse to ride here.

On 22/04/2016 00:26, Monica Omodei wrote:
> Thanks for that input Phil which reinforced our own thinking. The whole
> research infrastructure environment is under review and we are in a holding
> pattern organisationally for another 12 months. We are confident the
> outcome will be more sustained support into the future but it may look
> funny if we set up a new persistence service at this point in time no
> matter what neutral domain name we use. We do have many services which we
> know will continue though no matter under what organisational structure -
> eg Research Data Australia, Research Vocabularies Australia, our Research
> Grants and Projects portal, our DataCite DOI minting service .....

Understood.

>
> One option I didn't add was minting DOIs for vocab terms. Not an option I
> would have considered before but I note that Content negotiation is being
> implemented by DOI Registration Agencies for their DOI names as I read here
>
> https://www.doi.org/doi_handbook/5_Applications.html#5.4.1
>
> What do you think ?

Sorry but to be blunt I think it's a dreadful idea that should be 
squashed at the earliest opportunity.

I can rant about this for hours but will desist (I've deleted several 
versions of this e-mail, complete with logs of requests to the examples 
in that DOI handbook showing what happens when you dereference them with 
different accept headers and counting the external dependencies along 
the way).

DOIs are not a magic solution, they just another redirection service, 
like purl.org. You are looking for an alternative to purl.org as the 
future of that service is currently uncertain. Why jump from one 
centralised redirection service to another? There is nothing 
fundamentally different about a DOI that makes it any more stable or 
more decentralised than purl.org. The organisational commitments behind 
it are stronger, yes, but that's the only difference.

What's that? They don't depend on any technology? OK, let's see. What 
does doi:10.1103/PhysRevD.89.032002 identify? Well, it might be the 
paper that describes the discovery of the Higgs Bosun or it might be 
something else entirely, depending on your choice of resolver 
http://philarcher.org/10.1103/PhysRevD.89.032002

More seriously, DOIs are shared around as identifiers for articles and 
datasets etc. Except they very often dereference to a landing page 
*about* that thing: one identifier, two resources. At which point the 
Web is well and truly broken. They work well for people following links 
and tracking citations, but they're not good for machines. DOIs allow 
people to run Web sites and not bother about managing for persistence. 
But they are a terrible fit for vocabulary look up by machines.

OK, soap box going back in the cupboard.

Cheers

Phil.






There are some fundamental problems with DOIs:

- they are centralised and are no more stable



I did some digging into the link you sent. Taking the example they give:

Dependency 1: You're minded not to use purl.org because its future is 
uncertain. But you are happy to be dependent on another centralised 
redirection service? That seems odd for a start but let's follow it up 
and dereference the example in the DOI handbook:

curl -IiH "Accept: application/rdf+xml;q=0.5, 
application/vnd.citationstyles.csl+json;q=1.0" 
http://dx.doi.org/10.1126/science.169.3946.635
HTTP/1.1 303 See Other
Server: Apache-Coyote/1.1
Vary: Accept
Location: http://data.crossref.org/10.1126%2Fscience.169.3946.635
Expires: Fri, 22 Apr 2016 12:00:26 GMT
Content-Type: text/html;charset=utf-8
Content-Length: 195
Date: Fri, 22 Apr 2016 11:15:31 GMT

So I'm redirected from doi.org to crossref.org - chalk up external 
centralised dependency no. 2. If I deref that I get

curl -H "Accept: application/rdf+xml;q=0.5, 
application/vnd.citationstyles.csl+json;q=1.0" 
http://data.crossref.org/10.1126%2Fscience.169.3946.635
{
   "indexed":{
     "date-parts":[[2015,12,26]],
       "date-time":"2015-12-26T11:19:00Z",
       "time stamp":1451128740588},
       "reference-count":0,
       "publisher":"American Association for the Advancement of Science 
(AAAS)",
       "issue":"3946",
       "published-print":{
         "date-parts":[[1970,8,14]]

... blah blah. So there's some machine readable data.

What happens if I deref that original DOI with a different accept header?

curl -I http://dx.doi.org/10.1126/science.169.3946.635
HTTP/1.1 303 See Other
Server: Apache-Coyote/1.1
Vary: Accept
Location: http://www.sciencemag.org/cgi/doi/10.1126/science.169.3946.635
Expires: Fri, 22 Apr 2016 12:18:37 GMT
Content-Type: text/html;charset=utf-8
Content-Length: 209
Date: Fri, 22 Apr 2016 11:30:22 GMT

A different domain again.

Is the info I get back from that third service the same? It's 
consistent, but it's clearly not the same, showing that it's managed 
separately. Want to bet how long it will be before the two get out of sync?

This is a system that by design has different people managing the data 
and the metadata, all with centralised identifiers that are technically 
no more robust that the one you're minded not to use. When I hear Andrew 
Treloar, Jan Brasse et al talking about DOIs I want to scream - what 
actually does a DOI identify? Is it the dataset? 'Cos if I deref a DOI I 
normally get a landing page, which is not the same thing.



I am aware of the greater institutional support for DOIs but that's all 
it is.

Persistence is a choice, not a technical thing

DOIs are a solution to a problem people choose to give themselves, i.e. 
the problem of not managing their own Web space properly. I know they 
are beloved of the publishing industry and are seern as the answer to 
all sorts of problems but this example proves why they are antithetical 
to the architecture of the Web.




>
> Monica
>
>
>
> The broken link on our persistence awareness guide is embarrassing -
> apparently all the necessary redirects were requested and ostensibly
> implemented by the company who did our new web site but there was a problem
> with url file type extensions (don't ask me why that would be a problem).
> Still waiting the full solution but meanwhile our content person has done
> some manual fixes have been done.
>
>
>
> On Tue, Apr 19, 2016 at 8:43 PM, Phil Archer <phila@w3.org> wrote:
>
>> In my view, running your own domain name is the best option. ANDS already
>> has persistence as a core idea although, irony of ironies, I notice that
>> links I have in some of my work to your guidance on persistent identifiers
>> leads to, yes, 404s :-( (see
>> http://philarcher.org/diary/2013/uripersistence/#ands and the links to
>> things like
>> http://ands.org.au/guides/persistent-identifiers-awareness.html)
>>
>> Set up a domain that doesn't mention any organisational name, set it up
>> for one job only - to provide permanent URIs - and make it transferable.
>> id.org.au seems to be available at the moment, for example.
>>
>> purl.org was set up as *the* solution, thus creating a single point of
>> failure - which is now close to failure. Relying on someone else's
>> centralised system, be it purl or DOI, leaves you susceptible to their
>> future failures. Stepping in would mean taking on responsibility for other
>> people's redirections as well are your own. So be decentralised, use the
>> Web, look after your own needs.
>>
>> My 2 cents.
>>
>> Phil.
>>
>>
>>
>>
>>
>>
>>
>> On 19/04/2016 05:42, Monica Omodei wrote:
>>
>>> I hope this is an appropriate forum to ask advice.
>>>
>>> We have been using the purl.org resolver service for some years to
>>> provide
>>> a globally unique, persistent, resolvable identifier for Australian
>>> research grants so they can be used in metadata describing research
>>> outputs
>>> like publications, data, software etc. They resolve currently to a view
>>> page in Research Data Australia - researchdata.ands.org.au. There is an
>>> API
>>> but it returns only JSON currently, not XML or RDF
>>>
>>> We decided not to run our own resolver service because we are not an
>>> ongoing organisation with a guaranteed persistent domain and felt that a
>>> public resolver service was more suitable. At the time purl.org seemed
>>> the
>>> right choice. We are now struggling because we cannot make any changes. As
>>> has been noted in this forum the admin UI is not available at the
>>> moment. We also were concerned when the ability to create our own
>>> sub-domain was removed as we envisage want to hand over some sub-domains
>>> from our root, *au-research, *to different parties for maintenance.
>>>
>>> We now also support the Vocabularies Australia Service
>>> http://ands.org.au/online-services/research-vocabularies-australia using
>>> the SISSVoc software which was established to assist with the publication
>>> and widespread use of scientific vocabularies.
>>>
>>> The ANZSRC Field of Research Vocabulary (ABS 1297.0) which is used to
>>> classify research and its outputs in Australian and New Zealand is also
>>> published through this service and we maintain purls for these vocabulary
>>> terms eg
>>>
>>> http://purl.org/au-research/vocabulary/anzsrc-for/2008/
>>>
>>> We want to extend this provision of PURLs to other vocabularies but we
>>> cannot use the sane purl domain because of the current problems with OCLC
>>> supported purl.org service
>>>
>>> We need to decide whether to -
>>>
>>>      - switch to w3id.org as the domain for the other vocabulary purls
>>>      - run our own resolver service under a domain name we think can be
>>>      transferred to another organisation if/when necessary
>>>      - look for another public resolver service
>>>      - be patient and wait for the situation with purl.org to resolve
>>> itself
>>>      (no pun intended)
>>>
>>> Comments welcome,
>>>
>>> Monica Omodei
>>> Project Manager
>>> Australian National Data Service
>>>
>>>
>> --
>>
>>
>> Phil Archer
>> W3C Data Activity Lead
>> http://www.w3.org/2013/data/
>>
>> http://philarcher.org
>> +44 (0)7887 767755
>> @philarcher1
>>
>

-- 


Phil Archer
W3C Data Activity Lead
http://www.w3.org/2013/data/

http://philarcher.org
+44 (0)7887 767755
@philarcher1
Received on Friday, 22 April 2016 12:14:18 UTC