Re: Publishing new vocabularies - which resolver service from Phil Archer on 2016-04-26 (public-perma-id@w3.org from April 2016)

From: Phil Archer <phila@w3.org>
Date: Tue, 26 Apr 2016 18:31:22 +0100
To: "Haag, Jason" <jason.haag.ctr@adlnet.gov>
Cc: Monica Omodei <monica.omodei@gmail.com>, "public-perma-id@w3.org" <public-perma-id@w3.org>
Message-ID: <571FA5EA.6000903@w3.org>
Hi Jason, pls see inline below.

On 26/04/2016 16:33, Haag, Jason wrote:
> For what it's worth we also stay away from DOIs and recently moved to
> using w3id.org instead of purl.org, but the problems/challenges of
> decentralization and a single point of failure are still there. If
> something happened with w3id.org similar to purl.org (lack of
> resources,etc) then how will we provide readability/resolvability of
> our vocabularies?

This can be minimised by having a domain just designed to take care of 
your own persistent URIs. Once you set up a service that resolves other 
people's as well as your own, sooner or later an accountant will ask why 
you're subsidising other people's needs. OCLC did a fantastic job 
supporting purl.org for many years and is still providing the power and 
connectivity for its server.

>
> Potential Epiphany: Is there a semantic web property that provides a
> secondary option for vocabulary redundancy / fail-over? rdfs:seeAlso
> provides additional information, but I'm wondering if RDF publishing
> practices should be updated to allow for some form of redundancy in
> situations where persistent IRIs don't live up to their purpose and
> become unsteady. Hmmm..does this impractical and a lot of extra work
> though?

My colleague Sandro Hawke has proposed something along these lines 
although it's no more than an informal idea over a beer. The idea is 
that we'd have a property like definitionText, the value of which would 
(obviously) be the definition of that term. I could take the 
definitionText from one of your terms, copy and paste it into my vocab 
and because the definition text was identical, it would be the *same* 
term, even though it had a different URI. That gives us redundancy in 
that multiple copies would be made without multiplying the actual number 
of terms. Lots of issues there but it might be worth pursuing one day.

>
> Phil Archer's suggestion of setting up your own domain and resolution
> service for your community probably is the best approach, but for
> communities that don't have the resources or have disparate/evolving
> publishing practices this ins't always an option (at least initially).
> Persistence is a choice, but sometimes it's an unavoidable one.
>
> Have there ever been any talks of the W3C providing a long-term
> persistence service/solution other than what the W3id community group
> has supported? Just curious. If not, perhaps a stand-alone resolution
> service that could be run or managed independently by each community
> within their own servers & domain might be useful.

Yes, but we always come down to money. Our own material is persistent. 
See https://www.w3.org/Consortium/Persistence.html and 
http://philarcher.org/diary/2011/20yearsofmlarchives/. It is no 
exaggeration to say that docs in w3.org/TR and /ns at least will almost 
certainly outlive anyone reading this.

But... we don't have the resources to make an open ended commitment to 
manage a service for everyone indefinitely.

Cheers

Phil


>
>
>
>
> On Fri, Apr 22, 2016 at 7:14 AM, Phil Archer <phila@w3.org> wrote:
>> Hi Monica, pls see inline below - although in fairness, I must warn you, I
>> have a big hobby horse to ride here.
>>
>> On 22/04/2016 00:26, Monica Omodei wrote:
>>>
>>> Thanks for that input Phil which reinforced our own thinking. The whole
>>> research infrastructure environment is under review and we are in a
>>> holding
>>> pattern organisationally for another 12 months. We are confident the
>>> outcome will be more sustained support into the future but it may look
>>> funny if we set up a new persistence service at this point in time no
>>> matter what neutral domain name we use. We do have many services which we
>>> know will continue though no matter under what organisational structure -
>>> eg Research Data Australia, Research Vocabularies Australia, our Research
>>> Grants and Projects portal, our DataCite DOI minting service .....
>>
>>
>> Understood.
>>
>>>
>>> One option I didn't add was minting DOIs for vocab terms. Not an option I
>>> would have considered before but I note that Content negotiation is being
>>> implemented by DOI Registration Agencies for their DOI names as I read
>>> here
>>>
>>> https://www.doi.org/doi_handbook/5_Applications.html#5.4.1
>>>
>>> What do you think ?
>>
>>
>> Sorry but to be blunt I think it's a dreadful idea that should be squashed
>> at the earliest opportunity.
>>
>> I can rant about this for hours but will desist (I've deleted several
>> versions of this e-mail, complete with logs of requests to the examples in
>> that DOI handbook showing what happens when you dereference them with
>> different accept headers and counting the external dependencies along the
>> way).
>>
>> DOIs are not a magic solution, they just another redirection service, like
>> purl.org. You are looking for an alternative to purl.org as the future of
>> that service is currently uncertain. Why jump from one centralised
>> redirection service to another? There is nothing fundamentally different
>> about a DOI that makes it any more stable or more decentralised than
>> purl.org. The organisational commitments behind it are stronger, yes, but
>> that's the only difference.
>>
>> What's that? They don't depend on any technology? OK, let's see. What does
>> doi:10.1103/PhysRevD.89.032002 identify? Well, it might be the paper that
>> describes the discovery of the Higgs Bosun or it might be something else
>> entirely, depending on your choice of resolver
>> http://philarcher.org/10.1103/PhysRevD.89.032002
>>
>> More seriously, DOIs are shared around as identifiers for articles and
>> datasets etc. Except they very often dereference to a landing page *about*
>> that thing: one identifier, two resources. At which point the Web is well
>> and truly broken. They work well for people following links and tracking
>> citations, but they're not good for machines. DOIs allow people to run Web
>> sites and not bother about managing for persistence. But they are a terrible
>> fit for vocabulary look up by machines.
>>
>> OK, soap box going back in the cupboard.
>>
>> Cheers
>>
>> Phil.
>>
>>
>>
>>
>>
>>
>> There are some fundamental problems with DOIs:
>>
>> - they are centralised and are no more stable
>>
>>
>>
>> I did some digging into the link you sent. Taking the example they give:
>>
>> Dependency 1: You're minded not to use purl.org because its future is
>> uncertain. But you are happy to be dependent on another centralised
>> redirection service? That seems odd for a start but let's follow it up and
>> dereference the example in the DOI handbook:
>>
>> curl -IiH "Accept: application/rdf+xml;q=0.5,
>> application/vnd.citationstyles.csl+json;q=1.0"
>> http://dx.doi.org/10.1126/science.169.3946.635
>> HTTP/1.1 303 See Other
>> Server: Apache-Coyote/1.1
>> Vary: Accept
>> Location: http://data.crossref.org/10.1126%2Fscience.169.3946.635
>> Expires: Fri, 22 Apr 2016 12:00:26 GMT
>> Content-Type: text/html;charset=utf-8
>> Content-Length: 195
>> Date: Fri, 22 Apr 2016 11:15:31 GMT
>>
>> So I'm redirected from doi.org to crossref.org - chalk up external
>> centralised dependency no. 2. If I deref that I get
>>
>> curl -H "Accept: application/rdf+xml;q=0.5,
>> application/vnd.citationstyles.csl+json;q=1.0"
>> http://data.crossref.org/10.1126%2Fscience.169.3946.635
>> {
>>    "indexed":{
>>      "date-parts":[[2015,12,26]],
>>        "date-time":"2015-12-26T11:19:00Z",
>>        "time stamp":1451128740588},
>>        "reference-count":0,
>>        "publisher":"American Association for the Advancement of Science
>> (AAAS)",
>>        "issue":"3946",
>>        "published-print":{
>>          "date-parts":[[1970,8,14]]
>>
>> ... blah blah. So there's some machine readable data.
>>
>> What happens if I deref that original DOI with a different accept header?
>>
>> curl -I http://dx.doi.org/10.1126/science.169.3946.635
>> HTTP/1.1 303 See Other
>> Server: Apache-Coyote/1.1
>> Vary: Accept
>> Location: http://www.sciencemag.org/cgi/doi/10.1126/science.169.3946.635
>> Expires: Fri, 22 Apr 2016 12:18:37 GMT
>> Content-Type: text/html;charset=utf-8
>> Content-Length: 209
>> Date: Fri, 22 Apr 2016 11:30:22 GMT
>>
>> A different domain again.
>>
>> Is the info I get back from that third service the same? It's consistent,
>> but it's clearly not the same, showing that it's managed separately. Want to
>> bet how long it will be before the two get out of sync?
>>
>> This is a system that by design has different people managing the data and
>> the metadata, all with centralised identifiers that are technically no more
>> robust that the one you're minded not to use. When I hear Andrew Treloar,
>> Jan Brasse et al talking about DOIs I want to scream - what actually does a
>> DOI identify? Is it the dataset? 'Cos if I deref a DOI I normally get a
>> landing page, which is not the same thing.
>>
>>
>>
>> I am aware of the greater institutional support for DOIs but that's all it
>> is.
>>
>> Persistence is a choice, not a technical thing
>>
>> DOIs are a solution to a problem people choose to give themselves, i.e. the
>> problem of not managing their own Web space properly. I know they are
>> beloved of the publishing industry and are seern as the answer to all sorts
>> of problems but this example proves why they are antithetical to the
>> architecture of the Web.
>>
>>
>>
>>
>>
>>>
>>> Monica
>>>
>>>
>>>
>>> The broken link on our persistence awareness guide is embarrassing -
>>> apparently all the necessary redirects were requested and ostensibly
>>> implemented by the company who did our new web site but there was a
>>> problem
>>> with url file type extensions (don't ask me why that would be a problem).
>>> Still waiting the full solution but meanwhile our content person has done
>>> some manual fixes have been done.
>>>
>>>
>>>
>>> On Tue, Apr 19, 2016 at 8:43 PM, Phil Archer <phila@w3.org> wrote:
>>>
>>>> In my view, running your own domain name is the best option. ANDS already
>>>> has persistence as a core idea although, irony of ironies, I notice that
>>>> links I have in some of my work to your guidance on persistent
>>>> identifiers
>>>> leads to, yes, 404s :-( (see
>>>> http://philarcher.org/diary/2013/uripersistence/#ands and the links to
>>>> things like
>>>> http://ands.org.au/guides/persistent-identifiers-awareness.html)
>>>>
>>>> Set up a domain that doesn't mention any organisational name, set it up
>>>> for one job only - to provide permanent URIs - and make it transferable.
>>>> id.org.au seems to be available at the moment, for example.
>>>>
>>>> purl.org was set up as *the* solution, thus creating a single point of
>>>> failure - which is now close to failure. Relying on someone else's
>>>> centralised system, be it purl or DOI, leaves you susceptible to their
>>>> future failures. Stepping in would mean taking on responsibility for
>>>> other
>>>> people's redirections as well are your own. So be decentralised, use the
>>>> Web, look after your own needs.
>>>>
>>>> My 2 cents.
>>>>
>>>> Phil.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 19/04/2016 05:42, Monica Omodei wrote:
>>>>
>>>>> I hope this is an appropriate forum to ask advice.
>>>>>
>>>>> We have been using the purl.org resolver service for some years to
>>>>> provide
>>>>> a globally unique, persistent, resolvable identifier for Australian
>>>>> research grants so they can be used in metadata describing research
>>>>> outputs
>>>>> like publications, data, software etc. They resolve currently to a view
>>>>> page in Research Data Australia - researchdata.ands.org.au. There is an
>>>>> API
>>>>> but it returns only JSON currently, not XML or RDF
>>>>>
>>>>> We decided not to run our own resolver service because we are not an
>>>>> ongoing organisation with a guaranteed persistent domain and felt that a
>>>>> public resolver service was more suitable. At the time purl.org seemed
>>>>> the
>>>>> right choice. We are now struggling because we cannot make any changes.
>>>>> As
>>>>> has been noted in this forum the admin UI is not available at the
>>>>> moment. We also were concerned when the ability to create our own
>>>>> sub-domain was removed as we envisage want to hand over some sub-domains
>>>>> from our root, *au-research, *to different parties for maintenance.
>>>>>
>>>>> We now also support the Vocabularies Australia Service
>>>>> http://ands.org.au/online-services/research-vocabularies-australia using
>>>>> the SISSVoc software which was established to assist with the
>>>>> publication
>>>>> and widespread use of scientific vocabularies.
>>>>>
>>>>> The ANZSRC Field of Research Vocabulary (ABS 1297.0) which is used to
>>>>> classify research and its outputs in Australian and New Zealand is also
>>>>> published through this service and we maintain purls for these
>>>>> vocabulary
>>>>> terms eg
>>>>>
>>>>> http://purl.org/au-research/vocabulary/anzsrc-for/2008/
>>>>>
>>>>> We want to extend this provision of PURLs to other vocabularies but we
>>>>> cannot use the sane purl domain because of the current problems with
>>>>> OCLC
>>>>> supported purl.org service
>>>>>
>>>>> We need to decide whether to -
>>>>>
>>>>>       - switch to w3id.org as the domain for the other vocabulary purls
>>>>>       - run our own resolver service under a domain name we think can be
>>>>>       transferred to another organisation if/when necessary
>>>>>       - look for another public resolver service
>>>>>       - be patient and wait for the situation with purl.org to resolve
>>>>> itself
>>>>>       (no pun intended)
>>>>>
>>>>> Comments welcome,
>>>>>
>>>>> Monica Omodei
>>>>> Project Manager
>>>>> Australian National Data Service
>>>>>
>>>>>
>>>> --
>>>>
>>>>
>>>> Phil Archer
>>>> W3C Data Activity Lead
>>>> http://www.w3.org/2013/data/
>>>>
>>>> http://philarcher.org
>>>> +44 (0)7887 767755
>>>> @philarcher1
>>>>
>>>
>>
>> --
>>
>>
>> Phil Archer
>> W3C Data Activity Lead
>> http://www.w3.org/2013/data/
>>
>> http://philarcher.org
>> +44 (0)7887 767755
>> @philarcher1
>>
>
>

-- 


Phil Archer
W3C Data Activity Lead
http://www.w3.org/2013/data/

http://philarcher.org
+44 (0)7887 767755
@philarcher1
Received on Tuesday, 26 April 2016 17:31:34 UTC