Re: Publishing new vocabularies - which resolver service from Sandro Hawke on 2016-04-26 (public-perma-id@w3.org from April 2016)

From: Sandro Hawke <sandro@w3.org>
Date: Tue, 26 Apr 2016 15:10:25 -0400
To: Phil Archer <phila@w3.org>, "Haag, Jason" <jason.haag.ctr@adlnet.gov>
Cc: Monica Omodei <monica.omodei@gmail.com>, "public-perma-id@w3.org" <public-perma-id@w3.org>
Message-ID: <571FBD21.1090005@w3.org>
On 04/26/2016 01:31 PM, Phil Archer wrote:
> Hi Jason, pls see inline below.
>
> On 26/04/2016 16:33, Haag, Jason wrote:
>> For what it's worth we also stay away from DOIs and recently moved to
>> using w3id.org instead of purl.org, but the problems/challenges of
>> decentralization and a single point of failure are still there. If
>> something happened with w3id.org similar to purl.org (lack of
>> resources,etc) then how will we provide readability/resolvability of
>> our vocabularies?
>
> This can be minimised by having a domain just designed to take care of 
> your own persistent URIs. Once you set up a service that resolves 
> other people's as well as your own, sooner or later an accountant will 
> ask why you're subsidising other people's needs. OCLC did a fantastic 
> job supporting purl.org for many years and is still providing the 
> power and connectivity for its server.
>
>>
>> Potential Epiphany: Is there a semantic web property that provides a
>> secondary option for vocabulary redundancy / fail-over? rdfs:seeAlso
>> provides additional information, but I'm wondering if RDF publishing
>> practices should be updated to allow for some form of redundancy in
>> situations where persistent IRIs don't live up to their purpose and
>> become unsteady. Hmmm..does this impractical and a lot of extra work
>> though?
>
> My colleague Sandro Hawke has proposed something along these lines 
> although it's no more than an informal idea over a beer. The idea is 
> that we'd have a property like definitionText, the value of which 
> would (obviously) be the definition of that term. I could take the 
> definitionText from one of your terms, copy and paste it into my vocab 
> and because the definition text was identical, it would be the *same* 
> term, even though it had a different URI. That gives us redundancy in 
> that multiple copies would be made without multiplying the actual 
> number of terms. Lots of issues there but it might be worth pursuing 
> one day.
>

FWIW, I defined an ontology for this behavior:
http://www.w3.org/ns/mics

but I don't know of anyone using it.   If I were doing it again, I might 
do it more simply.

I also wrote a blog post about using this idea with JSON:
https://decentralyze.com/2014/06/30/growjson/


       -- Sandro

>>
>> Phil Archer's suggestion of setting up your own domain and resolution
>> service for your community probably is the best approach, but for
>> communities that don't have the resources or have disparate/evolving
>> publishing practices this ins't always an option (at least initially).
>> Persistence is a choice, but sometimes it's an unavoidable one.
>>
>> Have there ever been any talks of the W3C providing a long-term
>> persistence service/solution other than what the W3id community group
>> has supported? Just curious. If not, perhaps a stand-alone resolution
>> service that could be run or managed independently by each community
>> within their own servers & domain might be useful.
>
> Yes, but we always come down to money. Our own material is persistent. 
> See https://www.w3.org/Consortium/Persistence.html and 
> http://philarcher.org/diary/2011/20yearsofmlarchives/. It is no 
> exaggeration to say that docs in w3.org/TR and /ns at least will 
> almost certainly outlive anyone reading this.
>
> But... we don't have the resources to make an open ended commitment to 
> manage a service for everyone indefinitely.
>
> Cheers
>
> Phil
>
>
>>
>>
>>
>>
>> On Fri, Apr 22, 2016 at 7:14 AM, Phil Archer <phila@w3.org> wrote:
>>> Hi Monica, pls see inline below - although in fairness, I must warn 
>>> you, I
>>> have a big hobby horse to ride here.
>>>
>>> On 22/04/2016 00:26, Monica Omodei wrote:
>>>>
>>>> Thanks for that input Phil which reinforced our own thinking. The 
>>>> whole
>>>> research infrastructure environment is under review and we are in a
>>>> holding
>>>> pattern organisationally for another 12 months. We are confident the
>>>> outcome will be more sustained support into the future but it may look
>>>> funny if we set up a new persistence service at this point in time no
>>>> matter what neutral domain name we use. We do have many services 
>>>> which we
>>>> know will continue though no matter under what organisational 
>>>> structure -
>>>> eg Research Data Australia, Research Vocabularies Australia, our 
>>>> Research
>>>> Grants and Projects portal, our DataCite DOI minting service .....
>>>
>>>
>>> Understood.
>>>
>>>>
>>>> One option I didn't add was minting DOIs for vocab terms. Not an 
>>>> option I
>>>> would have considered before but I note that Content negotiation is 
>>>> being
>>>> implemented by DOI Registration Agencies for their DOI names as I read
>>>> here
>>>>
>>>> https://www.doi.org/doi_handbook/5_Applications.html#5.4.1
>>>>
>>>> What do you think ?
>>>
>>>
>>> Sorry but to be blunt I think it's a dreadful idea that should be 
>>> squashed
>>> at the earliest opportunity.
>>>
>>> I can rant about this for hours but will desist (I've deleted several
>>> versions of this e-mail, complete with logs of requests to the 
>>> examples in
>>> that DOI handbook showing what happens when you dereference them with
>>> different accept headers and counting the external dependencies 
>>> along the
>>> way).
>>>
>>> DOIs are not a magic solution, they just another redirection 
>>> service, like
>>> purl.org. You are looking for an alternative to purl.org as the 
>>> future of
>>> that service is currently uncertain. Why jump from one centralised
>>> redirection service to another? There is nothing fundamentally 
>>> different
>>> about a DOI that makes it any more stable or more decentralised than
>>> purl.org. The organisational commitments behind it are stronger, 
>>> yes, but
>>> that's the only difference.
>>>
>>> What's that? They don't depend on any technology? OK, let's see. 
>>> What does
>>> doi:10.1103/PhysRevD.89.032002 identify? Well, it might be the paper 
>>> that
>>> describes the discovery of the Higgs Bosun or it might be something 
>>> else
>>> entirely, depending on your choice of resolver
>>> http://philarcher.org/10.1103/PhysRevD.89.032002
>>>
>>> More seriously, DOIs are shared around as identifiers for articles and
>>> datasets etc. Except they very often dereference to a landing page 
>>> *about*
>>> that thing: one identifier, two resources. At which point the Web is 
>>> well
>>> and truly broken. They work well for people following links and 
>>> tracking
>>> citations, but they're not good for machines. DOIs allow people to 
>>> run Web
>>> sites and not bother about managing for persistence. But they are a 
>>> terrible
>>> fit for vocabulary look up by machines.
>>>
>>> OK, soap box going back in the cupboard.
>>>
>>> Cheers
>>>
>>> Phil.
>>>
>>>
>>>
>>>
>>>
>>>
>>> There are some fundamental problems with DOIs:
>>>
>>> - they are centralised and are no more stable
>>>
>>>
>>>
>>> I did some digging into the link you sent. Taking the example they 
>>> give:
>>>
>>> Dependency 1: You're minded not to use purl.org because its future is
>>> uncertain. But you are happy to be dependent on another centralised
>>> redirection service? That seems odd for a start but let's follow it 
>>> up and
>>> dereference the example in the DOI handbook:
>>>
>>> curl -IiH "Accept: application/rdf+xml;q=0.5,
>>> application/vnd.citationstyles.csl+json;q=1.0"
>>> http://dx.doi.org/10.1126/science.169.3946.635
>>> HTTP/1.1 303 See Other
>>> Server: Apache-Coyote/1.1
>>> Vary: Accept
>>> Location: http://data.crossref.org/10.1126%2Fscience.169.3946.635
>>> Expires: Fri, 22 Apr 2016 12:00:26 GMT
>>> Content-Type: text/html;charset=utf-8
>>> Content-Length: 195
>>> Date: Fri, 22 Apr 2016 11:15:31 GMT
>>>
>>> So I'm redirected from doi.org to crossref.org - chalk up external
>>> centralised dependency no. 2. If I deref that I get
>>>
>>> curl -H "Accept: application/rdf+xml;q=0.5,
>>> application/vnd.citationstyles.csl+json;q=1.0"
>>> http://data.crossref.org/10.1126%2Fscience.169.3946.635
>>> {
>>>    "indexed":{
>>>      "date-parts":[[2015,12,26]],
>>>        "date-time":"2015-12-26T11:19:00Z",
>>>        "time stamp":1451128740588},
>>>        "reference-count":0,
>>>        "publisher":"American Association for the Advancement of Science
>>> (AAAS)",
>>>        "issue":"3946",
>>>        "published-print":{
>>>          "date-parts":[[1970,8,14]]
>>>
>>> ... blah blah. So there's some machine readable data.
>>>
>>> What happens if I deref that original DOI with a different accept 
>>> header?
>>>
>>> curl -I http://dx.doi.org/10.1126/science.169.3946.635
>>> HTTP/1.1 303 See Other
>>> Server: Apache-Coyote/1.1
>>> Vary: Accept
>>> Location: 
>>> http://www.sciencemag.org/cgi/doi/10.1126/science.169.3946.635
>>> Expires: Fri, 22 Apr 2016 12:18:37 GMT
>>> Content-Type: text/html;charset=utf-8
>>> Content-Length: 209
>>> Date: Fri, 22 Apr 2016 11:30:22 GMT
>>>
>>> A different domain again.
>>>
>>> Is the info I get back from that third service the same? It's 
>>> consistent,
>>> but it's clearly not the same, showing that it's managed separately. 
>>> Want to
>>> bet how long it will be before the two get out of sync?
>>>
>>> This is a system that by design has different people managing the 
>>> data and
>>> the metadata, all with centralised identifiers that are technically 
>>> no more
>>> robust that the one you're minded not to use. When I hear Andrew 
>>> Treloar,
>>> Jan Brasse et al talking about DOIs I want to scream - what actually 
>>> does a
>>> DOI identify? Is it the dataset? 'Cos if I deref a DOI I normally get a
>>> landing page, which is not the same thing.
>>>
>>>
>>>
>>> I am aware of the greater institutional support for DOIs but that's 
>>> all it
>>> is.
>>>
>>> Persistence is a choice, not a technical thing
>>>
>>> DOIs are a solution to a problem people choose to give themselves, 
>>> i.e. the
>>> problem of not managing their own Web space properly. I know they are
>>> beloved of the publishing industry and are seern as the answer to 
>>> all sorts
>>> of problems but this example proves why they are antithetical to the
>>> architecture of the Web.
>>>
>>>
>>>
>>>
>>>
>>>>
>>>> Monica
>>>>
>>>>
>>>>
>>>> The broken link on our persistence awareness guide is embarrassing -
>>>> apparently all the necessary redirects were requested and ostensibly
>>>> implemented by the company who did our new web site but there was a
>>>> problem
>>>> with url file type extensions (don't ask me why that would be a 
>>>> problem).
>>>> Still waiting the full solution but meanwhile our content person 
>>>> has done
>>>> some manual fixes have been done.
>>>>
>>>>
>>>>
>>>> On Tue, Apr 19, 2016 at 8:43 PM, Phil Archer <phila@w3.org> wrote:
>>>>
>>>>> In my view, running your own domain name is the best option. ANDS 
>>>>> already
>>>>> has persistence as a core idea although, irony of ironies, I 
>>>>> notice that
>>>>> links I have in some of my work to your guidance on persistent
>>>>> identifiers
>>>>> leads to, yes, 404s :-( (see
>>>>> http://philarcher.org/diary/2013/uripersistence/#ands and the 
>>>>> links to
>>>>> things like
>>>>> http://ands.org.au/guides/persistent-identifiers-awareness.html)
>>>>>
>>>>> Set up a domain that doesn't mention any organisational name, set 
>>>>> it up
>>>>> for one job only - to provide permanent URIs - and make it 
>>>>> transferable.
>>>>> id.org.au seems to be available at the moment, for example.
>>>>>
>>>>> purl.org was set up as *the* solution, thus creating a single 
>>>>> point of
>>>>> failure - which is now close to failure. Relying on someone else's
>>>>> centralised system, be it purl or DOI, leaves you susceptible to 
>>>>> their
>>>>> future failures. Stepping in would mean taking on responsibility for
>>>>> other
>>>>> people's redirections as well are your own. So be decentralised, 
>>>>> use the
>>>>> Web, look after your own needs.
>>>>>
>>>>> My 2 cents.
>>>>>
>>>>> Phil.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 19/04/2016 05:42, Monica Omodei wrote:
>>>>>
>>>>>> I hope this is an appropriate forum to ask advice.
>>>>>>
>>>>>> We have been using the purl.org resolver service for some years to
>>>>>> provide
>>>>>> a globally unique, persistent, resolvable identifier for Australian
>>>>>> research grants so they can be used in metadata describing research
>>>>>> outputs
>>>>>> like publications, data, software etc. They resolve currently to 
>>>>>> a view
>>>>>> page in Research Data Australia - researchdata.ands.org.au. There 
>>>>>> is an
>>>>>> API
>>>>>> but it returns only JSON currently, not XML or RDF
>>>>>>
>>>>>> We decided not to run our own resolver service because we are not an
>>>>>> ongoing organisation with a guaranteed persistent domain and felt 
>>>>>> that a
>>>>>> public resolver service was more suitable. At the time purl.org 
>>>>>> seemed
>>>>>> the
>>>>>> right choice. We are now struggling because we cannot make any 
>>>>>> changes.
>>>>>> As
>>>>>> has been noted in this forum the admin UI is not available at the
>>>>>> moment. We also were concerned when the ability to create our own
>>>>>> sub-domain was removed as we envisage want to hand over some 
>>>>>> sub-domains
>>>>>> from our root, *au-research, *to different parties for maintenance.
>>>>>>
>>>>>> We now also support the Vocabularies Australia Service
>>>>>> http://ands.org.au/online-services/research-vocabularies-australia using 
>>>>>>
>>>>>> the SISSVoc software which was established to assist with the
>>>>>> publication
>>>>>> and widespread use of scientific vocabularies.
>>>>>>
>>>>>> The ANZSRC Field of Research Vocabulary (ABS 1297.0) which is 
>>>>>> used to
>>>>>> classify research and its outputs in Australian and New Zealand 
>>>>>> is also
>>>>>> published through this service and we maintain purls for these
>>>>>> vocabulary
>>>>>> terms eg
>>>>>>
>>>>>> http://purl.org/au-research/vocabulary/anzsrc-for/2008/
>>>>>>
>>>>>> We want to extend this provision of PURLs to other vocabularies 
>>>>>> but we
>>>>>> cannot use the sane purl domain because of the current problems with
>>>>>> OCLC
>>>>>> supported purl.org service
>>>>>>
>>>>>> We need to decide whether to -
>>>>>>
>>>>>>       - switch to w3id.org as the domain for the other vocabulary 
>>>>>> purls
>>>>>>       - run our own resolver service under a domain name we think 
>>>>>> can be
>>>>>>       transferred to another organisation if/when necessary
>>>>>>       - look for another public resolver service
>>>>>>       - be patient and wait for the situation with purl.org to 
>>>>>> resolve
>>>>>> itself
>>>>>>       (no pun intended)
>>>>>>
>>>>>> Comments welcome,
>>>>>>
>>>>>> Monica Omodei
>>>>>> Project Manager
>>>>>> Australian National Data Service
>>>>>>
>>>>>>
>>>>> -- 
>>>>>
>>>>>
>>>>> Phil Archer
>>>>> W3C Data Activity Lead
>>>>> http://www.w3.org/2013/data/
>>>>>
>>>>> http://philarcher.org
>>>>> +44 (0)7887 767755
>>>>> @philarcher1
>>>>>
>>>>
>>>
>>> -- 
>>>
>>>
>>> Phil Archer
>>> W3C Data Activity Lead
>>> http://www.w3.org/2013/data/
>>>
>>> http://philarcher.org
>>> +44 (0)7887 767755
>>> @philarcher1
>>>
>>
>>
>
Received on Tuesday, 26 April 2016 19:10:29 UTC