RE: Publishing new vocabularies - which resolver service from Simon.Cox@csiro.au on 2016-04-26 (public-perma-id@w3.org from April 2016)

From: <Simon.Cox@csiro.au>
Date: Tue, 26 Apr 2016 22:33:57 +0000
To: <phila@w3.org>, <monica.omodei@gmail.com>
CC: <public-perma-id@w3.org>
Message-ID: <74317b8a7a1342379974f8d8c793fab5@exch1-mel.nexus.csiro.au>
> DOIs are shared around as identifiers for articles and datasets etc. Except they very often dereference to a landing page *about* that thing:

AFAIK it is a *requirement* of the DOI system that they dereference to a landing-page. 

I agree with Phil here on the consequence, but would state it a little more carefully - the web of *data* is broken. 
In particular, it's hard to see how deep linking (i.e. to sub-items, extracts etc.) would work in this world. 

Overall: 
1. the strong institutional commitments behind DOI are very very valuable and important, however
2. DOI was scoped to a particular use case (publishing journal articles and thereabouts). 

The DataCite extension of DOI to datasets (i.e. snapshots represented in files) is plausible, since the intention is still that they would be used whole. 
But more fine-grained applications are probably beyond the scope of DOI without quite a rethink around conneg. 
I know that is going on, but hard to see a very useable solution emerging. 

Simon 

-----Original Message-----
From: Phil Archer [mailto:phila@w3.org] 
Sent: Friday, 22 April 2016 10:14 PM
To: Monica Omodei <monica.omodei@gmail.com>
Cc: public-perma-id@w3.org
Subject: Re: Publishing new vocabularies - which resolver service

Hi Monica, pls see inline below - although in fairness, I must warn you, I have a big hobby horse to ride here.

On 22/04/2016 00:26, Monica Omodei wrote:
> Thanks for that input Phil which reinforced our own thinking. The 
> whole research infrastructure environment is under review and we are 
> in a holding pattern organisationally for another 12 months. We are 
> confident the outcome will be more sustained support into the future 
> but it may look funny if we set up a new persistence service at this 
> point in time no matter what neutral domain name we use. We do have 
> many services which we know will continue though no matter under what 
> organisational structure - eg Research Data Australia, Research 
> Vocabularies Australia, our Research Grants and Projects portal, our DataCite DOI minting service .....

Understood.

>
> One option I didn't add was minting DOIs for vocab terms. Not an 
> option I would have considered before but I note that Content 
> negotiation is being implemented by DOI Registration Agencies for 
> their DOI names as I read here
>
> https://www.doi.org/doi_handbook/5_Applications.html#5.4.1

>
> What do you think ?

Sorry but to be blunt I think it's a dreadful idea that should be squashed at the earliest opportunity.

I can rant about this for hours but will desist (I've deleted several versions of this e-mail, complete with logs of requests to the examples in that DOI handbook showing what happens when you dereference them with different accept headers and counting the external dependencies along the way).

DOIs are not a magic solution, they just another redirection service, like purl.org. You are looking for an alternative to purl.org as the future of that service is currently uncertain. Why jump from one centralised redirection service to another? There is nothing fundamentally different about a DOI that makes it any more stable or more decentralised than purl.org. The organisational commitments behind it are stronger, yes, but that's the only difference.

What's that? They don't depend on any technology? OK, let's see. What does doi:10.1103/PhysRevD.89.032002 identify? Well, it might be the paper that describes the discovery of the Higgs Bosun or it might be something else entirely, depending on your choice of resolver
http://philarcher.org/10.1103/PhysRevD.89.032002


More seriously, DOIs are shared around as identifiers for articles and datasets etc. Except they very often dereference to a landing page
*about* that thing: one identifier, two resources. At which point the Web is well and truly broken. They work well for people following links and tracking citations, but they're not good for machines. DOIs allow people to run Web sites and not bother about managing for persistence. 
But they are a terrible fit for vocabulary look up by machines.

OK, soap box going back in the cupboard.

Cheers

Phil.






There are some fundamental problems with DOIs:

- they are centralised and are no more stable



I did some digging into the link you sent. Taking the example they give:

Dependency 1: You're minded not to use purl.org because its future is uncertain. But you are happy to be dependent on another centralised redirection service? That seems odd for a start but let's follow it up and dereference the example in the DOI handbook:

curl -IiH "Accept: application/rdf+xml;q=0.5, application/vnd.citationstyles.csl+json;q=1.0" 
http://dx.doi.org/10.1126/science.169.3946.635

HTTP/1.1 303 See Other
Server: Apache-Coyote/1.1
Vary: Accept
Location: http://data.crossref.org/10.1126%2Fscience.169.3946.635

Expires: Fri, 22 Apr 2016 12:00:26 GMT
Content-Type: text/html;charset=utf-8
Content-Length: 195
Date: Fri, 22 Apr 2016 11:15:31 GMT

So I'm redirected from doi.org to crossref.org - chalk up external centralised dependency no. 2. If I deref that I get

curl -H "Accept: application/rdf+xml;q=0.5, application/vnd.citationstyles.csl+json;q=1.0" 
http://data.crossref.org/10.1126%2Fscience.169.3946.635

{
   "indexed":{
     "date-parts":[[2015,12,26]],
       "date-time":"2015-12-26T11:19:00Z",
       "time stamp":1451128740588},
       "reference-count":0,
       "publisher":"American Association for the Advancement of Science (AAAS)",
       "issue":"3946",
       "published-print":{
         "date-parts":[[1970,8,14]]

... blah blah. So there's some machine readable data.

What happens if I deref that original DOI with a different accept header?

curl -I http://dx.doi.org/10.1126/science.169.3946.635

HTTP/1.1 303 See Other
Server: Apache-Coyote/1.1
Vary: Accept
Location: http://www.sciencemag.org/cgi/doi/10.1126/science.169.3946.635

Expires: Fri, 22 Apr 2016 12:18:37 GMT
Content-Type: text/html;charset=utf-8
Content-Length: 209
Date: Fri, 22 Apr 2016 11:30:22 GMT

A different domain again.

Is the info I get back from that third service the same? It's consistent, but it's clearly not the same, showing that it's managed separately. Want to bet how long it will be before the two get out of sync?

This is a system that by design has different people managing the data and the metadata, all with centralised identifiers that are technically no more robust that the one you're minded not to use. When I hear Andrew Treloar, Jan Brasse et al talking about DOIs I want to scream - what actually does a DOI identify? Is it the dataset? 'Cos if I deref a DOI I normally get a landing page, which is not the same thing.



I am aware of the greater institutional support for DOIs but that's all it is.

Persistence is a choice, not a technical thing

DOIs are a solution to a problem people choose to give themselves, i.e. 
the problem of not managing their own Web space properly. I know they are beloved of the publishing industry and are seern as the answer to all sorts of problems but this example proves why they are antithetical to the architecture of the Web.




>
> Monica
>
>
>
> The broken link on our persistence awareness guide is embarrassing - 
> apparently all the necessary redirects were requested and ostensibly 
> implemented by the company who did our new web site but there was a 
> problem with url file type extensions (don't ask me why that would be a problem).
> Still waiting the full solution but meanwhile our content person has 
> done some manual fixes have been done.
>
>
>
> On Tue, Apr 19, 2016 at 8:43 PM, Phil Archer <phila@w3.org> wrote:
>
>> In my view, running your own domain name is the best option. ANDS 
>> already has persistence as a core idea although, irony of ironies, I 
>> notice that links I have in some of my work to your guidance on 
>> persistent identifiers leads to, yes, 404s :-( (see 
>> http://philarcher.org/diary/2013/uripersistence/#ands and the links 
>> to things like
>> http://ands.org.au/guides/persistent-identifiers-awareness.html)
>>
>> Set up a domain that doesn't mention any organisational name, set it 
>> up for one job only - to provide permanent URIs - and make it transferable.
>> id.org.au seems to be available at the moment, for example.
>>
>> purl.org was set up as *the* solution, thus creating a single point 
>> of failure - which is now close to failure. Relying on someone else's 
>> centralised system, be it purl or DOI, leaves you susceptible to 
>> their future failures. Stepping in would mean taking on 
>> responsibility for other people's redirections as well are your own. 
>> So be decentralised, use the Web, look after your own needs.
>>
>> My 2 cents.
>>
>> Phil.
>>
>>
>>
>>
>>
>>
>>
>> On 19/04/2016 05:42, Monica Omodei wrote:
>>
>>> I hope this is an appropriate forum to ask advice.
>>>
>>> We have been using the purl.org resolver service for some years to 
>>> provide a globally unique, persistent, resolvable identifier for 
>>> Australian research grants so they can be used in metadata 
>>> describing research outputs like publications, data, software etc. 
>>> They resolve currently to a view page in Research Data Australia - 
>>> researchdata.ands.org.au. There is an API but it returns only JSON 
>>> currently, not XML or RDF
>>>
>>> We decided not to run our own resolver service because we are not an 
>>> ongoing organisation with a guaranteed persistent domain and felt 
>>> that a public resolver service was more suitable. At the time 
>>> purl.org seemed the right choice. We are now struggling because we 
>>> cannot make any changes. As has been noted in this forum the admin 
>>> UI is not available at the moment. We also were concerned when the 
>>> ability to create our own sub-domain was removed as we envisage want 
>>> to hand over some sub-domains from our root, *au-research, *to 
>>> different parties for maintenance.
>>>
>>> We now also support the Vocabularies Australia Service 
>>> http://ands.org.au/online-services/research-vocabularies-australia 
>>> using the SISSVoc software which was established to assist with the 
>>> publication and widespread use of scientific vocabularies.
>>>
>>> The ANZSRC Field of Research Vocabulary (ABS 1297.0) which is used 
>>> to classify research and its outputs in Australian and New Zealand 
>>> is also published through this service and we maintain purls for 
>>> these vocabulary terms eg
>>>
>>> http://purl.org/au-research/vocabulary/anzsrc-for/2008/

>>>
>>> We want to extend this provision of PURLs to other vocabularies but 
>>> we cannot use the sane purl domain because of the current problems 
>>> with OCLC supported purl.org service
>>>
>>> We need to decide whether to -
>>>
>>>      - switch to w3id.org as the domain for the other vocabulary purls
>>>      - run our own resolver service under a domain name we think can be
>>>      transferred to another organisation if/when necessary
>>>      - look for another public resolver service
>>>      - be patient and wait for the situation with purl.org to 
>>> resolve itself
>>>      (no pun intended)
>>>
>>> Comments welcome,
>>>
>>> Monica Omodei
>>> Project Manager
>>> Australian National Data Service
>>>
>>>
>> --
>>
>>
>> Phil Archer
>> W3C Data Activity Lead
>> http://www.w3.org/2013/data/

>>
>> http://philarcher.org

>> +44 (0)7887 767755
>> @philarcher1
>>
>

-- 


Phil Archer
W3C Data Activity Lead
http://www.w3.org/2013/data/


http://philarcher.org

+44 (0)7887 767755
@philarcher1
Received on Tuesday, 26 April 2016 22:34:47 UTC