Open data autodiscovery for organisations

Hi, I can't remember if I ever mentioned our project 
<http://opd.data.ac.uk/> on this list?

What it is, is a method for auto-discovering open data from (and about) 
an organisation, given it's website. It's specifically aimed at 
"predictable" datasets, that you would expect/hope an organisation to 
provide, rather than unique datasets which you would find in an 
organisation's data-catalogue rather than try to discover for a specific 
purpose.

Our pilot use-case was discovering lists of research equipment from UK 
universities. Every week we check the website of *every* www.???.ac.uk  
website looking for open data, and if we find some we added to the daily 
crawl.

Signposting complete datasets from the homepage is more distributed-web 
as it means you don't need a big crawler.

One thing we felt was very important was to make it easy for non experts 
to create the basic RDF document describing their organisation. To this 
end we have given verbatim examples rather than just published 
ontologies <http://opd.data.ac.uk/docs/social>. I'm proud to say we've 
seen .ttl files produced by non-IT admin staff! To make things easier we 
provide a tool to check the autodiscovery and validity and meaning of 
the .ttl file <http://opd.data.ac.uk/checker>.

We provide two alternate mechanisms to discover the OPD (the ttl file 
describing the organisation and it's major datasets) because my 
experience of website-politics is that the IT dept control the server 
(and can easily add a redirect) and the comms dept control the content 
(and can easily add to the homepage)

At the bottom of http://opd.data.ac.uk/ you can see a respectable list 
of UK universities already implementing this, which I think is an 
exciting place to build from.

While all people *had* to do is add the triples for their equipment 
dataset, many followed our other examples and added sections describing 
their basic metadata <http://opd.data.ac.uk/dataset/core>, social media 
accounts <http://opd.data.ac.uk/dataset/social>, and key webpages 
<http://opd.data.ac.uk/dataset/linkingyou>

This is the technology that underpins <http://equipment.data.ac.uk/> 
(but to my frustration, some sites have been added to this service by 
hand, which increases the long term support effort to run the site). But 
if you happen to have a www.*.ac.uk website (we don't check subdomains) 
you should be able to add your data to the equipment list just by 
putting the correct information on your website.

Right now, this is all ticking over, and only 27 institutions have 
implemented it. Right now, we've not got any new project work based on 
it. However it's all open source and infinitely extensible.

- Christopher Gutteridge, University of Southampton

Received on Monday, 16 April 2018 14:05:01 UTC