Re: Persistence

Hi,
As soon as I hit the send, I realized someone would take exception to my Open Source for Open Government Data which should have more throughtfully written, "Commercially supported Open Source for Open Government Data, as appropriate".  Sorry for not being more inclusive.  As long as the data can be exchanged in an internationally standards compliant manner, we're all happy (I hope).  Of course each company is free to determine appropriate business models to support their efforts, be it proprietary solutions or commercially support FLOSS.

Please don't take my comment about PURLs to mean that I was recommending OCLC.  I simply stated that OCLC has addressed open, persistent identifiers for managing Web resources primarily for the *library community*.  They don't want to take on other markets AFAIK.  I further pointed out, they're a non-profit with a .org.  And, they use Open Source software, namely http://purlz.org to handle redirection which is one of the big issues for government.  There are many, but good resource persistence management and reporting isn't really appreciated the way it needs to be.  To be clear, I suggest the Linked Data community on behalf of Government Open Data effort consider the Open Source PURLs solution.  I'm not suggesting any connection to OCLC.

The role of an executor is a social engineering issue that a non-profit is potentially best poised to handle ... but I've previously expressed that I haven't expertise in this area, just PURLs.

I hope that is clearer ... it has been a busy day getting ready to relax with a fat turkey and friends.

Happy Thanksgiving to all who celebrate this holiday.

Cheers,
Bernadette

On Nov 22, 2011, at 9:12 PM, Sandro Hawke wrote:

> On Tue, 2011-11-22 at 14:12 -0500, Bernadette Hyland wrote:
>> Hi,
>> Let's please remember our mantra, "re-use, re-use, re-use" and "Open
>> Source for Open Government Data."  
> 
> Please remember that the W3C community includes vendors of proprietary
> software.  We are happy to see both open source and closed source
> implementations of our Recommendations.  
> 
>> On Nov 22, 2011, at 10:12 AM, Phil Archer wrote:
>> 
>>> This makes a lot of sense.
>>> 
>>> Who would be that will executor? In the case of public sector
>>> websites, presumably the relevant national archive? Is there a
>>> business model here I wonder ;-)
>>> 
>> 
>> 
>> Yes, its called a non-profit like OCLC who for better or for worse,
>> supports the worldwide library community through the purls.org domain.
>> A look at OCLC's website is instructive.  The library community has
>> wrestled & solved many of the issues the LD/LOD community raise.  See
>> below.
> 
> Purls are a reasonable way to avoid having to pay domain-name
> registration fees, front-end hosting fees, and having to manage a
> front-end, but I don't think they solve the big problems here.
> 
> I don't see anything on the purl.org site suggesting they are willing or
> able to take on the executor role, or even to work with someone who
> would take on this role.    I expect they could be persuaded to do the
> latter, and I'd hope they might do the former, at least for the library
> community.    (I'm happy they've adopted 303 redirects; it took quite a
> few years.)
> 
> I'm not quite sure who the executor role could best be done by.  It
> should be an institution that is more stable and trusted than the
> day-to-day organization, and which is motivated to maintain public
> trust.
> 
> I have a mental image from old fiction of the lawyer who spends 20 years
> hunting down the heirs to some estate before delivering the inheritance.
> I don't know how that actually works, though, or how to get a lawyer to
> do that.  
> 
> Some ideas:
> 
>  - very large business service companies, eg IBM.  The money to pay for
> the service should perhaps be put in a separate annuity account, so
> providing the service remains eternally funded, and that business unit
> can be sold off, but couldn't easily go bankrupt.
>  - a private foundation with an endowment and some alignment of goals
> (suggestions?   I don't know this space very well)
>  - a stable, non-controversial government agency (eg National Archives,
> as you say; in the US, perhaps the Library of Congress)
>  - an international treaty organization (UN, WTO, ITU)
>  - Internet organizations, like W3C, ICANN, ISoc
>  - ...?
> 
> I don't think we should select these -- just describe what they should
> do, and maybe in the directory we can have a few organizations willing
> to do it.
> 
>> 
>>> As for top level domains, some are more politically acceptable than
>>> others of course. Perversely perhaps, it seems that a vocabulary
>>> hosted on example.eu, example.us or example.gov.uk might face more
>>> resistance to uptake than example.ie or example.ly, especially if it
>>> spelled out a nice word like semantical.ly (which appears to be
>>> available btw).
>> 
>> Are you kidding?? ly = Libya.  Anyone who approves that for government
>> use should have their badge taken away.  Seriously?!
> 
> He did say "perversely".   I take the point to be that folks will not
> always be rational about making decisions like that.   "id.gov.uk" looks
> like national thing, while domain hacks like "semantical.ly" or
> "repli.ca", for better or worse, to the lay person, do not.
> 
>>> 
>>> What we're talking about is maintaining a set of URIs for the long
>>> term for vocabularies. For documents and Web content in general, an
>>> archivist might take a different view. Britain's National Archives
>>> can, legitimately, say that, for example, the Bercow Report of July
>>> 2008 is still publicly available online. It's at:
>>> 
>>> http://webarchive.nationalarchives.gov.uk/20080528125538/http://www.dcsf.gov.uk/bercowreview/docs/7771-DCSF-BERCOW.PDF
>>> 
>>> The issue though is that it used to be at
>>> http://www.dcsf.gov.uk/bercowreview/docs/7771-DCSF-BERCOW.PDF
>>> 
>> 
>> 
>> This example is *precisely* the case for implementing a persistent
>> identifier solution (also called a permanent URL architecture).  
> 
> I don't think a PURL architecture helps.   Let's look at why doesn't the
> dcsf.gov.uk URL no longer works:    
> 
> 1.  The department changed it's name.  It wanted to be known as
> "education" not "dcsf".   So if its purls had "dcsf" in them, the
> department would still want/need them changed.
> 
> 2.  The department didn't plan for the future and/or doesn't care about
> the past.  They still have the name "www.dcsf.gov.uk", and it's set up
> to redirect.  Why aren't they redirecting the document Phil mentions?
> They didn't think enough, before, to make it easy to maintain, and they
> don't care enough now, to bother to maintain it.
> 
> As an organization, the W3C administers w3.org with some care to this.
> Mostly, we have a institutional commitment to keep old URLs functioning.
> Knowing we'll be doing that, it's unlikely we would ever make a
> top-level directory like "bercowreview".
> 
> The actual policy is any staff member can claim a name like
> "http://www.w3.org/2011/11/foo", but to get "http://www.w3.org/2011/bar"
> or especially "http://www.w3.org/baz" requires top-level management
> approval.  To get outside of "date space" (as with "baz"), you have to
> convince management that for the rest of time, that will be a sensible
> name for resources in that space.    After 11 years at W3C, I finally
> got my first of those, with /egov.    (And I might be buying headaches
> for us down the road, but I argued convincingly that having 2007 in the
> URL made people think the content was old.)   News organizations and
> some blog and CMS software seem to have mostly figured this out, now.
> 
>> PURLs  is one such Open Source project that is used extensively by the
>> worldwide library community, the US Government Printing Office through
>> the US Federal Depository Library Program (for which 3 Round
>> Stones provides commercial support), National Center for Biomedical
>> Ontology, Shared Names, among others.
>> 
>> 
>>> and if anyone had linked to the original URI then someone following
>>> that link would see a short HTML page explaining at the dcsf.gov.uk
>>> site is no longer in operation, where the current live version is,
>>> and where the archive is. That's a very basic message for humans and
>>> no message at all for machines.
>>> 
>>> Hmmm... Given that the original URI of the doc is preserved within
>>> the new one, it shouldn't be too hard to come up with a script that
>>> automatically gave a 301 redirect *if* the target gave a sensible
>>> 200 response and a helpful message in case the target lead to a 404?
>>> 
>>> Phil.
>>> 
>> 
>> 
>> Hang on, there is no need to write a script or recreate the wheel
>> here.  
> 
> Surely the purl.org software is massive overkill.  Phil is talking about
> a few lines in an apache config.   
> 
> *IF* the DCSF had used date-space exclusively, they could do all the
> redirects like:
> 
> RewriteRule ((2007|2008|2009|2010).*) http://webarchive.nationalarchives.gov.uk/20080528125538/http://www.dcsf.gov.uk/$1
> 
> (for whatever years they no longer have active.)
> 
> If they didn't use date-space, they'd need to list all the resources
> moved over to the web archive, like this:
> 
> RewriteRule bercowreview/docs/7771-DCSF-BERCOW.PDF http://webarchive.nationalarchives.gov.uk/20080528125538/http://www.dcsf.gov.uk/bercowreview/docs/7771-DCSF-BERCOW.PDF
> 
> That's more of a hassle, but it still works.   It could also be done
> with a tiny CGI script that consults a database of all the archived
> URLs.
> 
>> For permanent URLs that transcend changing infrastructure, I urge
>> using the modern PURLs server has been running in production for 3+
>> years as OCLC's PURLs service and 2+ years for the US Government
>> Printing Office.  The predecessor to the modern PURLs server was in
>> production for 12 years.  This is a tested & proven solution to
>> permanent URL architecture.
>> 
>> 
>> The Open Source PURLs server is a web-scale, production application
>> with an easy to use interface (a bookmarklet for creating PURLs), and
>> nice reporting capabilities for maintenance.
>> 
>> 
>> PURLs is based on HTTP and URI specs from the IETF.  Recently we've
>> thrown in some TAG decisions and W3C Best Practices for use with RDF
>> and Linked Data (303 support).
>> 
>> 
>> Check out the PURLs site & if you have further questions about
>> production deployments, I'm happy to respond to them.  Re-use, re-use,
>> re-use.
> 
> I have nothing against PURLs for folks who, for one reason or another,
> can't realistically maintain their own web space on their own domain
> names, but I don't really see it helping when:
> 
>   - organizations need their name in their URLs and sometimes
>     need to change their name
>   - organizations don't plan their URL space to allow for decades
>     of accumulation
>   - organizations disappear, or change their mission so much they
>     will ignore the users they once pledged to support
> 
> These are the things I'm trying to address.  I think extra domain names
> and living wills are a pretty good (if unproven) answer.
> 
>   -- Sandro
>> 
>> Cheers,
>> 
>> 
>> Bernadette Hyland
>> co-chair W3C Government Linked Data Working Group
>> charter http://www.w3.org/2011/gld/charter
>> 
>> 
>>> 
>>> On 22/11/2011 14:30, Richard Cyganiak wrote:
>>>> On 17 Nov 2011, at 19:26, Sandro Hawke wrote:
>>>>> My strawman proposal would be:
>>>>> 
>>>>> - vocabularies should be given their own domain name, probably
>>>>> in .net
>>>>> (they are infrastructure).   this way full ownership as well as
>>>>> maintenance duties can be transfered, legally, as necessary.
>>>> 
>>>> +1. Getting an own domain for the vocabulary also helps keeping
>>>> the URIs short.
>>>> 
>>>> On the other hand, using something like purl.org also seems
>>>> reasonable.
>>>> 
>>>> I'm agnostic regarding the top-level domain. I note that the .net
>>>> TLD isn't terribly popular and I can't think of many current
>>>> examples of vocabularies in the .net namespace.
>>>> 
>>>>> - there should be a two-level ownership structure, where one
>>>>> disinterested, trusted, 3rd party (like the executor of a will)
>>>>> retains
>>>>> final control, but delegates to the creator/maintainer.   With
>>>>> written
>>>>> policies about what happens in various eventualities.   But,
>>>>> basically,
>>>>> if either of these parties loses interest, they can be smoothly
>>>>> replaced, and if the creator/maintainer ceases operation or
>>>>> stops acting
>>>>> in good faith, it can be replaced.
>>>> 
>>>> Again, +1.
>>>> 
>>>> Best,
>>>> Richard
>>>> 
>>> 
>>> -- 
>>> 
>>> 
>>> Phil Archer
>>> W3C eGovernment
>>> http://www.w3.org/egov/
>>> 
>>> http://philarcher.org
>>> +44 (0)7887 767755
>>> @philarcher1
>>> 
>>> 
>> 
>> 
> 
> 
> 

Received on Wednesday, 23 November 2011 03:04:38 UTC