- From: Herbert Van de Sompel <hvdsomp@gmail.com>
- Date: Sun, 13 Dec 2009 16:59:15 -0500
- To: Jeni Tennison <jeni@jenitennison.com>
- Cc: Linked Data community <public-lod@w3.org>, John Sheridan <John.Sheridan@nationalarchives.gsi.gov.uk>
Hi Jeni I wonder whether ORE Aggregations could be (part of) a solution: http://www.openarchives.org/ore/1.0/toc Greetings Herbert Van de Sompel Sent from my iPhone On Dec 13, 2009, at 15:20, Jeni Tennison <jeni@jenitennison.com> wrote: > Hi, > > Dave (Reynolds) raised the point that lists are an integral part of > most APIs. This is another thing that we know we need to address in > the UK linked government data project, but are unsure as yet how > best to do so. > > This is a bit of a brain dump of my current thinking, which is > mostly packed with uncertainty! I'd be very grateful for any > thoughts, guidance, pointers that you have. > > The situation is that we have our nice linked data all up and > available at suitable URIs, for example: > > http://education.data.gov.uk/id/school/520965 > > (somewhat password protected; if you ignore all the prompts you'll > be able to see the HTML page just without any styling) but we need > to somehow provide better mechanisms for people to navigate around it. > > The kind of thing I'd like to see is support for URLs like: > > http://education.data.gov.uk/doc/school > - list of all schools > http://education.data.gov.uk/doc/school/phase/nursery > - list of nursery schools > http://education.data.gov.uk/doc/school/administrativeDistrict/47UE > - list of schools whose administrative district is Worcester > > and so on. > > # Defining List Membership # > > The first question is: How do we define which resources are members > of a list? > > I've discussed this quite briefly with Leigh and Ian (Davis). They > seem to favour explicitly incorporating information about the > membership of such lists within the triplestore itself. For example, > perhaps something like: > > <http://education.data.gov.uk/id/school> > a rdf:Bag ; > rdfs:label "All Schools"@en ; > rdfs:member > <http://education.data.gov.uk/id/school/100000> , > <http://education.data.gov.uk/id/school/100001> , > <http://education.data.gov.uk/id/school/100002> , > ... > > <http://education.data.gov.uk/id/school/phase/nursery> > a rdf:Bag ; > rdfs:label "Nursery Schools"@en ; > rdfs:member > <http://education.data.gov.uk/id/school/500003> , > <http://education.data.gov.uk/id/school/500004> , > <http://education.data.gov.uk/id/school/500005> , > ... > > <http://education.data.gov.uk/id/school/administrativeDistrict/47UE> > a rdf:Bag ; > rdfs:label "Schools in Worcester"@en ; > rdfs:member > <http://education.data.gov.uk/id/school/116749> , > <http://education.data.gov.uk/id/school/116750> , > <http://education.data.gov.uk/id/school/116751> , > ... > > This is reasonably nice in that you can make up whatever lists you > like without being tied to particular conventions in the list URI. > It also means that the information about what things are in what > lists is right there, and queryable, within the triplestore. So you > could find nursery schools in Worcester with: > > SELECT ?school > WHERE { > ?school > rdfs:member <http://education.data.gov.uk/id/school/phase/ > nursery> ; > rdfs:member <http://education.data.gov.uk/id/school/administrativeDistrict/47UE > > > } > > The difficulty lies in ensuring that the lists are correct to start > with, especially when the data might come from multiple sources with > their own generation routines, and that it remains up to date as the > data changes over time. In particular, I think this approach really > prevents you from layering an API over an existing set of data: the > publishers of the linked data have to also determine how the API > works, when otherwise those roles could be reasonably cleanly > separated. > > An alternative is to somehow define the lists in terms of a SPARQL > query or a higher-level declarative mechanism. For example, we might > do: > > <http://education.data.gov.uk/id/school> > a api:List ; > rdfs:label "Schools"@en ; > api:itemType <http://education.data.gov.uk/def/school/School> . > > or: > > <http://education.data.gov.uk/id/school/administrativeDistrict/47UE> > a api:List ; > rdfs:label "Schools in Worcester"@en ; > api:where "?item <http://education.data.gov.uk/def/school/districtAdministrative > > <http://statistics.data.gov.uk/id/local-authority-district/47UE>" . > > or use something like SPIN [1] to express the query as RDF. > > Or we could go one level higher and do something like: > > <http://education.data.gov.uk/id/school/phase/*> > a api:ListSet ; > rdfs:label "Schools By Phase of Education"@en ; > api:pattern "http://education.data.gov.uk/id/school/phase/(nursery|primary|secondary) > "^^xsd:string ; > api:map [ > api:regexGroup 1 ; > api:property rdf:type ; > api:enumeration [ > api:token "nursery" ; > api:resource <http://education.data.gov.uk/def/school/TypeOfEstablishment_EY_Setting > > ; > ], > ... > ] . > > It's not at all clear to me what the best approach it here. I tend > to think that although a higher-level language might make things > simpler in some ways, providing SPARQL queries gives the most > flexibility. Anyone have any thoughts? > > # Pagination and Sorting # > > Lists are often going to be very long, so we'll need some way to > support paging through the results that come back. It might also be > useful to provide different sort orders. For example: > > http://education.data.gov.uk/doc/school?sort=label&startIndex=21&itemsPerPage=20 > > should give the second page of (20) results, in label order. > > What I thought here is that we should assign *collections* URIs like: > > http://education.data.gov.uk/id/school > > These are unordered and unpaginated. A request would result in a 303 > redirect to the document: > > http://education.data.gov.uk/doc/school > > which is the same as: > > http://education.data.gov.uk/doc/school?sort=label&startIndex=1&itemsPerPage=20 > > (say) and is the first page of the (ordered, paginated) list. The > RDF graph actually returned would be something like: > > <http://education.data.gov.uk/id/school> > rdfs:label "Schools"@en ; > foaf:isPrimaryTopicOf <http://education.data.gov.uk/doc/school> . > > <http://education.data.gov.uk/doc/school> > rdfs:label "Schools (First 20, Ordered Alphabetically)"@en ; > foaf:primaryTopic <http://education.data.gov.uk/id/school> ; > xhv:next <http://education.data.gov.uk/doc/school?startIndex=21> ; > ... other metadata ... > api:items ( > <http://education.data.gov.uk/id/school/135160> > <http://education.data.gov.uk/id/school/135441> > <http://education.data.gov.uk/id/school/135868> > ... > ) . > > <http://education.data.gov.uk/id/school/135160> > rdfs:label "# New Comm Pri @ Allaway Avenue" ; > ... other triples ... > > ... statements about the other members of this list ... > > Note here that the triples about the collection are curtailed to not > include all the members of the collection (since to include them > would kinda defeat the purpose of the pagination). If the collection > were defined through a mechanism other than a list of members, then > including that configuration information would be a good thing to do > here. > > One possibility is to curtail the information that's available about > each item: we'd probably always want to include label and type, but > maybe other things would be useful as well, like location in the > case of a school. Then again, perhaps it's best just to include > everything we can about those items (eg a labelled concise bounded > description [2]); clients can always filter out what they don't need. > > To facilitate ordering, we need a capability to label the different > properties with short names; this is similar to (maybe the same as) > the requirement for JSON renditions of RDF, so it would make sense > to use the same kind of rules (ie default to the local name of the > property, provide configuration to override). > > # Representations # > > There are four kinds of representations that are particularly useful > for lists: > > * RDF (RDF/XML, Turtle), obviously, for semweb heads > * a feed (Atom or RSS) for humans to subscribe to > * HTML for humans to look at > * JSON of some description for normal developers > > The first two could be hit through using RSS 1.0 to describe the > list. Does anyone have any thoughts about whether that would be the > best approach? I think the only thing that makes me hesitate is the > use of rdf:Seq to list the items, which I gather has fallen out of > vogue. > > --- > > Anyway, hopefully Dave will set up a Google Code project where we > can try to spec some of this out and maybe get some implementations > in place. > > Any thoughts welcome! > > Jeni > > [1]: http://spinrdf.org/sp.html > [2]: http://n2.talis.com/wiki/Bounded_Descriptions_in_RDF#Labelled_Concise_Bounded_Description > -- > Jeni Tennison > http://www.jenitennison.com > >
Received on Sunday, 13 December 2009 21:59:58 UTC