- From: Jeni Tennison <jeni@jenitennison.com>
- Date: Sun, 13 Dec 2009 20:20:39 +0000
- To: Linked Data community <public-lod@w3.org>
- Cc: John Sheridan <John.Sheridan@nationalarchives.gsi.gov.uk>
Hi,
Dave (Reynolds) raised the point that lists are an integral part of
most APIs. This is another thing that we know we need to address in
the UK linked government data project, but are unsure as yet how best
to do so.
This is a bit of a brain dump of my current thinking, which is mostly
packed with uncertainty! I'd be very grateful for any thoughts,
guidance, pointers that you have.
The situation is that we have our nice linked data all up and
available at suitable URIs, for example:
http://education.data.gov.uk/id/school/520965
(somewhat password protected; if you ignore all the prompts you'll be
able to see the HTML page just without any styling) but we need to
somehow provide better mechanisms for people to navigate around it.
The kind of thing I'd like to see is support for URLs like:
http://education.data.gov.uk/doc/school
- list of all schools
http://education.data.gov.uk/doc/school/phase/nursery
- list of nursery schools
http://education.data.gov.uk/doc/school/administrativeDistrict/47UE
- list of schools whose administrative district is Worcester
and so on.
# Defining List Membership #
The first question is: How do we define which resources are members of
a list?
I've discussed this quite briefly with Leigh and Ian (Davis). They
seem to favour explicitly incorporating information about the
membership of such lists within the triplestore itself. For example,
perhaps something like:
<http://education.data.gov.uk/id/school>
a rdf:Bag ;
rdfs:label "All Schools"@en ;
rdfs:member
<http://education.data.gov.uk/id/school/100000> ,
<http://education.data.gov.uk/id/school/100001> ,
<http://education.data.gov.uk/id/school/100002> ,
...
<http://education.data.gov.uk/id/school/phase/nursery>
a rdf:Bag ;
rdfs:label "Nursery Schools"@en ;
rdfs:member
<http://education.data.gov.uk/id/school/500003> ,
<http://education.data.gov.uk/id/school/500004> ,
<http://education.data.gov.uk/id/school/500005> ,
...
<http://education.data.gov.uk/id/school/administrativeDistrict/47UE>
a rdf:Bag ;
rdfs:label "Schools in Worcester"@en ;
rdfs:member
<http://education.data.gov.uk/id/school/116749> ,
<http://education.data.gov.uk/id/school/116750> ,
<http://education.data.gov.uk/id/school/116751> ,
...
This is reasonably nice in that you can make up whatever lists you
like without being tied to particular conventions in the list URI. It
also means that the information about what things are in what lists is
right there, and queryable, within the triplestore. So you could find
nursery schools in Worcester with:
SELECT ?school
WHERE {
?school
rdfs:member <http://education.data.gov.uk/id/school/phase/
nursery> ;
rdfs:member <http://education.data.gov.uk/id/school/administrativeDistrict/47UE
>
}
The difficulty lies in ensuring that the lists are correct to start
with, especially when the data might come from multiple sources with
their own generation routines, and that it remains up to date as the
data changes over time. In particular, I think this approach really
prevents you from layering an API over an existing set of data: the
publishers of the linked data have to also determine how the API
works, when otherwise those roles could be reasonably cleanly separated.
An alternative is to somehow define the lists in terms of a SPARQL
query or a higher-level declarative mechanism. For example, we might do:
<http://education.data.gov.uk/id/school>
a api:List ;
rdfs:label "Schools"@en ;
api:itemType <http://education.data.gov.uk/def/school/School> .
or:
<http://education.data.gov.uk/id/school/administrativeDistrict/47UE>
a api:List ;
rdfs:label "Schools in Worcester"@en ;
api:where "?item <http://education.data.gov.uk/def/school/districtAdministrative
> <http://statistics.data.gov.uk/id/local-authority-district/47UE>" .
or use something like SPIN [1] to express the query as RDF.
Or we could go one level higher and do something like:
<http://education.data.gov.uk/id/school/phase/*>
a api:ListSet ;
rdfs:label "Schools By Phase of Education"@en ;
api:pattern "http://education.data.gov.uk/id/school/phase/
(nursery|primary|secondary)"^^xsd:string ;
api:map [
api:regexGroup 1 ;
api:property rdf:type ;
api:enumeration [
api:token "nursery" ;
api:resource <http://education.data.gov.uk/def/school/TypeOfEstablishment_EY_Setting
> ;
],
...
] .
It's not at all clear to me what the best approach it here. I tend to
think that although a higher-level language might make things simpler
in some ways, providing SPARQL queries gives the most flexibility.
Anyone have any thoughts?
# Pagination and Sorting #
Lists are often going to be very long, so we'll need some way to
support paging through the results that come back. It might also be
useful to provide different sort orders. For example:
http://education.data.gov.uk/doc/school?sort=label&startIndex=21&itemsPerPage=20
should give the second page of (20) results, in label order.
What I thought here is that we should assign *collections* URIs like:
http://education.data.gov.uk/id/school
These are unordered and unpaginated. A request would result in a 303
redirect to the document:
http://education.data.gov.uk/doc/school
which is the same as:
http://education.data.gov.uk/doc/school?sort=label&startIndex=1&itemsPerPage=20
(say) and is the first page of the (ordered, paginated) list. The RDF
graph actually returned would be something like:
<http://education.data.gov.uk/id/school>
rdfs:label "Schools"@en ;
foaf:isPrimaryTopicOf <http://education.data.gov.uk/doc/school> .
<http://education.data.gov.uk/doc/school>
rdfs:label "Schools (First 20, Ordered Alphabetically)"@en ;
foaf:primaryTopic <http://education.data.gov.uk/id/school> ;
xhv:next <http://education.data.gov.uk/doc/school?startIndex=21> ;
... other metadata ...
api:items (
<http://education.data.gov.uk/id/school/135160>
<http://education.data.gov.uk/id/school/135441>
<http://education.data.gov.uk/id/school/135868>
...
) .
<http://education.data.gov.uk/id/school/135160>
rdfs:label "# New Comm Pri @ Allaway Avenue" ;
... other triples ...
... statements about the other members of this list ...
Note here that the triples about the collection are curtailed to not
include all the members of the collection (since to include them would
kinda defeat the purpose of the pagination). If the collection were
defined through a mechanism other than a list of members, then
including that configuration information would be a good thing to do
here.
One possibility is to curtail the information that's available about
each item: we'd probably always want to include label and type, but
maybe other things would be useful as well, like location in the case
of a school. Then again, perhaps it's best just to include everything
we can about those items (eg a labelled concise bounded description
[2]); clients can always filter out what they don't need.
To facilitate ordering, we need a capability to label the different
properties with short names; this is similar to (maybe the same as)
the requirement for JSON renditions of RDF, so it would make sense to
use the same kind of rules (ie default to the local name of the
property, provide configuration to override).
# Representations #
There are four kinds of representations that are particularly useful
for lists:
* RDF (RDF/XML, Turtle), obviously, for semweb heads
* a feed (Atom or RSS) for humans to subscribe to
* HTML for humans to look at
* JSON of some description for normal developers
The first two could be hit through using RSS 1.0 to describe the list.
Does anyone have any thoughts about whether that would be the best
approach? I think the only thing that makes me hesitate is the use of
rdf:Seq to list the items, which I gather has fallen out of vogue.
---
Anyway, hopefully Dave will set up a Google Code project where we can
try to spec some of this out and maybe get some implementations in
place.
Any thoughts welcome!
Jeni
[1]: http://spinrdf.org/sp.html
[2]: http://n2.talis.com/wiki/Bounded_Descriptions_in_RDF#Labelled_Concise_Bounded_Description
--
Jeni Tennison
http://www.jenitennison.com
Received on Sunday, 13 December 2009 20:21:14 UTC