- From: Jeni Tennison <jeni@jenitennison.com>
- Date: Sun, 13 Dec 2009 20:20:39 +0000
- To: Linked Data community <public-lod@w3.org>
- Cc: John Sheridan <John.Sheridan@nationalarchives.gsi.gov.uk>
Hi, Dave (Reynolds) raised the point that lists are an integral part of most APIs. This is another thing that we know we need to address in the UK linked government data project, but are unsure as yet how best to do so. This is a bit of a brain dump of my current thinking, which is mostly packed with uncertainty! I'd be very grateful for any thoughts, guidance, pointers that you have. The situation is that we have our nice linked data all up and available at suitable URIs, for example: http://education.data.gov.uk/id/school/520965 (somewhat password protected; if you ignore all the prompts you'll be able to see the HTML page just without any styling) but we need to somehow provide better mechanisms for people to navigate around it. The kind of thing I'd like to see is support for URLs like: http://education.data.gov.uk/doc/school - list of all schools http://education.data.gov.uk/doc/school/phase/nursery - list of nursery schools http://education.data.gov.uk/doc/school/administrativeDistrict/47UE - list of schools whose administrative district is Worcester and so on. # Defining List Membership # The first question is: How do we define which resources are members of a list? I've discussed this quite briefly with Leigh and Ian (Davis). They seem to favour explicitly incorporating information about the membership of such lists within the triplestore itself. For example, perhaps something like: <http://education.data.gov.uk/id/school> a rdf:Bag ; rdfs:label "All Schools"@en ; rdfs:member <http://education.data.gov.uk/id/school/100000> , <http://education.data.gov.uk/id/school/100001> , <http://education.data.gov.uk/id/school/100002> , ... <http://education.data.gov.uk/id/school/phase/nursery> a rdf:Bag ; rdfs:label "Nursery Schools"@en ; rdfs:member <http://education.data.gov.uk/id/school/500003> , <http://education.data.gov.uk/id/school/500004> , <http://education.data.gov.uk/id/school/500005> , ... <http://education.data.gov.uk/id/school/administrativeDistrict/47UE> a rdf:Bag ; rdfs:label "Schools in Worcester"@en ; rdfs:member <http://education.data.gov.uk/id/school/116749> , <http://education.data.gov.uk/id/school/116750> , <http://education.data.gov.uk/id/school/116751> , ... This is reasonably nice in that you can make up whatever lists you like without being tied to particular conventions in the list URI. It also means that the information about what things are in what lists is right there, and queryable, within the triplestore. So you could find nursery schools in Worcester with: SELECT ?school WHERE { ?school rdfs:member <http://education.data.gov.uk/id/school/phase/ nursery> ; rdfs:member <http://education.data.gov.uk/id/school/administrativeDistrict/47UE > } The difficulty lies in ensuring that the lists are correct to start with, especially when the data might come from multiple sources with their own generation routines, and that it remains up to date as the data changes over time. In particular, I think this approach really prevents you from layering an API over an existing set of data: the publishers of the linked data have to also determine how the API works, when otherwise those roles could be reasonably cleanly separated. An alternative is to somehow define the lists in terms of a SPARQL query or a higher-level declarative mechanism. For example, we might do: <http://education.data.gov.uk/id/school> a api:List ; rdfs:label "Schools"@en ; api:itemType <http://education.data.gov.uk/def/school/School> . or: <http://education.data.gov.uk/id/school/administrativeDistrict/47UE> a api:List ; rdfs:label "Schools in Worcester"@en ; api:where "?item <http://education.data.gov.uk/def/school/districtAdministrative > <http://statistics.data.gov.uk/id/local-authority-district/47UE>" . or use something like SPIN [1] to express the query as RDF. Or we could go one level higher and do something like: <http://education.data.gov.uk/id/school/phase/*> a api:ListSet ; rdfs:label "Schools By Phase of Education"@en ; api:pattern "http://education.data.gov.uk/id/school/phase/ (nursery|primary|secondary)"^^xsd:string ; api:map [ api:regexGroup 1 ; api:property rdf:type ; api:enumeration [ api:token "nursery" ; api:resource <http://education.data.gov.uk/def/school/TypeOfEstablishment_EY_Setting > ; ], ... ] . It's not at all clear to me what the best approach it here. I tend to think that although a higher-level language might make things simpler in some ways, providing SPARQL queries gives the most flexibility. Anyone have any thoughts? # Pagination and Sorting # Lists are often going to be very long, so we'll need some way to support paging through the results that come back. It might also be useful to provide different sort orders. For example: http://education.data.gov.uk/doc/school?sort=label&startIndex=21&itemsPerPage=20 should give the second page of (20) results, in label order. What I thought here is that we should assign *collections* URIs like: http://education.data.gov.uk/id/school These are unordered and unpaginated. A request would result in a 303 redirect to the document: http://education.data.gov.uk/doc/school which is the same as: http://education.data.gov.uk/doc/school?sort=label&startIndex=1&itemsPerPage=20 (say) and is the first page of the (ordered, paginated) list. The RDF graph actually returned would be something like: <http://education.data.gov.uk/id/school> rdfs:label "Schools"@en ; foaf:isPrimaryTopicOf <http://education.data.gov.uk/doc/school> . <http://education.data.gov.uk/doc/school> rdfs:label "Schools (First 20, Ordered Alphabetically)"@en ; foaf:primaryTopic <http://education.data.gov.uk/id/school> ; xhv:next <http://education.data.gov.uk/doc/school?startIndex=21> ; ... other metadata ... api:items ( <http://education.data.gov.uk/id/school/135160> <http://education.data.gov.uk/id/school/135441> <http://education.data.gov.uk/id/school/135868> ... ) . <http://education.data.gov.uk/id/school/135160> rdfs:label "# New Comm Pri @ Allaway Avenue" ; ... other triples ... ... statements about the other members of this list ... Note here that the triples about the collection are curtailed to not include all the members of the collection (since to include them would kinda defeat the purpose of the pagination). If the collection were defined through a mechanism other than a list of members, then including that configuration information would be a good thing to do here. One possibility is to curtail the information that's available about each item: we'd probably always want to include label and type, but maybe other things would be useful as well, like location in the case of a school. Then again, perhaps it's best just to include everything we can about those items (eg a labelled concise bounded description [2]); clients can always filter out what they don't need. To facilitate ordering, we need a capability to label the different properties with short names; this is similar to (maybe the same as) the requirement for JSON renditions of RDF, so it would make sense to use the same kind of rules (ie default to the local name of the property, provide configuration to override). # Representations # There are four kinds of representations that are particularly useful for lists: * RDF (RDF/XML, Turtle), obviously, for semweb heads * a feed (Atom or RSS) for humans to subscribe to * HTML for humans to look at * JSON of some description for normal developers The first two could be hit through using RSS 1.0 to describe the list. Does anyone have any thoughts about whether that would be the best approach? I think the only thing that makes me hesitate is the use of rdf:Seq to list the items, which I gather has fallen out of vogue. --- Anyway, hopefully Dave will set up a Google Code project where we can try to spec some of this out and maybe get some implementations in place. Any thoughts welcome! Jeni [1]: http://spinrdf.org/sp.html [2]: http://n2.talis.com/wiki/Bounded_Descriptions_in_RDF#Labelled_Concise_Bounded_Description -- Jeni Tennison http://www.jenitennison.com
Received on Sunday, 13 December 2009 20:21:14 UTC