- From: Herbert Van de Sompel <hvdsomp@gmail.com>
- Date: Sun, 13 Dec 2009 16:59:15 -0500
- To: Jeni Tennison <jeni@jenitennison.com>
- Cc: Linked Data community <public-lod@w3.org>, John Sheridan <John.Sheridan@nationalarchives.gsi.gov.uk>
Hi Jeni
I wonder whether ORE Aggregations could be (part of) a solution:
http://www.openarchives.org/ore/1.0/toc
Greetings
Herbert Van de Sompel
Sent from my iPhone
On Dec 13, 2009, at 15:20, Jeni Tennison <jeni@jenitennison.com> wrote:
> Hi,
>
> Dave (Reynolds) raised the point that lists are an integral part of
> most APIs. This is another thing that we know we need to address in
> the UK linked government data project, but are unsure as yet how
> best to do so.
>
> This is a bit of a brain dump of my current thinking, which is
> mostly packed with uncertainty! I'd be very grateful for any
> thoughts, guidance, pointers that you have.
>
> The situation is that we have our nice linked data all up and
> available at suitable URIs, for example:
>
> http://education.data.gov.uk/id/school/520965
>
> (somewhat password protected; if you ignore all the prompts you'll
> be able to see the HTML page just without any styling) but we need
> to somehow provide better mechanisms for people to navigate around it.
>
> The kind of thing I'd like to see is support for URLs like:
>
> http://education.data.gov.uk/doc/school
> - list of all schools
> http://education.data.gov.uk/doc/school/phase/nursery
> - list of nursery schools
> http://education.data.gov.uk/doc/school/administrativeDistrict/47UE
> - list of schools whose administrative district is Worcester
>
> and so on.
>
> # Defining List Membership #
>
> The first question is: How do we define which resources are members
> of a list?
>
> I've discussed this quite briefly with Leigh and Ian (Davis). They
> seem to favour explicitly incorporating information about the
> membership of such lists within the triplestore itself. For example,
> perhaps something like:
>
> <http://education.data.gov.uk/id/school>
> a rdf:Bag ;
> rdfs:label "All Schools"@en ;
> rdfs:member
> <http://education.data.gov.uk/id/school/100000> ,
> <http://education.data.gov.uk/id/school/100001> ,
> <http://education.data.gov.uk/id/school/100002> ,
> ...
>
> <http://education.data.gov.uk/id/school/phase/nursery>
> a rdf:Bag ;
> rdfs:label "Nursery Schools"@en ;
> rdfs:member
> <http://education.data.gov.uk/id/school/500003> ,
> <http://education.data.gov.uk/id/school/500004> ,
> <http://education.data.gov.uk/id/school/500005> ,
> ...
>
> <http://education.data.gov.uk/id/school/administrativeDistrict/47UE>
> a rdf:Bag ;
> rdfs:label "Schools in Worcester"@en ;
> rdfs:member
> <http://education.data.gov.uk/id/school/116749> ,
> <http://education.data.gov.uk/id/school/116750> ,
> <http://education.data.gov.uk/id/school/116751> ,
> ...
>
> This is reasonably nice in that you can make up whatever lists you
> like without being tied to particular conventions in the list URI.
> It also means that the information about what things are in what
> lists is right there, and queryable, within the triplestore. So you
> could find nursery schools in Worcester with:
>
> SELECT ?school
> WHERE {
> ?school
> rdfs:member <http://education.data.gov.uk/id/school/phase/
> nursery> ;
> rdfs:member <http://education.data.gov.uk/id/school/administrativeDistrict/47UE
> >
> }
>
> The difficulty lies in ensuring that the lists are correct to start
> with, especially when the data might come from multiple sources with
> their own generation routines, and that it remains up to date as the
> data changes over time. In particular, I think this approach really
> prevents you from layering an API over an existing set of data: the
> publishers of the linked data have to also determine how the API
> works, when otherwise those roles could be reasonably cleanly
> separated.
>
> An alternative is to somehow define the lists in terms of a SPARQL
> query or a higher-level declarative mechanism. For example, we might
> do:
>
> <http://education.data.gov.uk/id/school>
> a api:List ;
> rdfs:label "Schools"@en ;
> api:itemType <http://education.data.gov.uk/def/school/School> .
>
> or:
>
> <http://education.data.gov.uk/id/school/administrativeDistrict/47UE>
> a api:List ;
> rdfs:label "Schools in Worcester"@en ;
> api:where "?item <http://education.data.gov.uk/def/school/districtAdministrative
> > <http://statistics.data.gov.uk/id/local-authority-district/47UE>" .
>
> or use something like SPIN [1] to express the query as RDF.
>
> Or we could go one level higher and do something like:
>
> <http://education.data.gov.uk/id/school/phase/*>
> a api:ListSet ;
> rdfs:label "Schools By Phase of Education"@en ;
> api:pattern "http://education.data.gov.uk/id/school/phase/(nursery|primary|secondary)
> "^^xsd:string ;
> api:map [
> api:regexGroup 1 ;
> api:property rdf:type ;
> api:enumeration [
> api:token "nursery" ;
> api:resource <http://education.data.gov.uk/def/school/TypeOfEstablishment_EY_Setting
> > ;
> ],
> ...
> ] .
>
> It's not at all clear to me what the best approach it here. I tend
> to think that although a higher-level language might make things
> simpler in some ways, providing SPARQL queries gives the most
> flexibility. Anyone have any thoughts?
>
> # Pagination and Sorting #
>
> Lists are often going to be very long, so we'll need some way to
> support paging through the results that come back. It might also be
> useful to provide different sort orders. For example:
>
> http://education.data.gov.uk/doc/school?sort=label&startIndex=21&itemsPerPage=20
>
> should give the second page of (20) results, in label order.
>
> What I thought here is that we should assign *collections* URIs like:
>
> http://education.data.gov.uk/id/school
>
> These are unordered and unpaginated. A request would result in a 303
> redirect to the document:
>
> http://education.data.gov.uk/doc/school
>
> which is the same as:
>
> http://education.data.gov.uk/doc/school?sort=label&startIndex=1&itemsPerPage=20
>
> (say) and is the first page of the (ordered, paginated) list. The
> RDF graph actually returned would be something like:
>
> <http://education.data.gov.uk/id/school>
> rdfs:label "Schools"@en ;
> foaf:isPrimaryTopicOf <http://education.data.gov.uk/doc/school> .
>
> <http://education.data.gov.uk/doc/school>
> rdfs:label "Schools (First 20, Ordered Alphabetically)"@en ;
> foaf:primaryTopic <http://education.data.gov.uk/id/school> ;
> xhv:next <http://education.data.gov.uk/doc/school?startIndex=21> ;
> ... other metadata ...
> api:items (
> <http://education.data.gov.uk/id/school/135160>
> <http://education.data.gov.uk/id/school/135441>
> <http://education.data.gov.uk/id/school/135868>
> ...
> ) .
>
> <http://education.data.gov.uk/id/school/135160>
> rdfs:label "# New Comm Pri @ Allaway Avenue" ;
> ... other triples ...
>
> ... statements about the other members of this list ...
>
> Note here that the triples about the collection are curtailed to not
> include all the members of the collection (since to include them
> would kinda defeat the purpose of the pagination). If the collection
> were defined through a mechanism other than a list of members, then
> including that configuration information would be a good thing to do
> here.
>
> One possibility is to curtail the information that's available about
> each item: we'd probably always want to include label and type, but
> maybe other things would be useful as well, like location in the
> case of a school. Then again, perhaps it's best just to include
> everything we can about those items (eg a labelled concise bounded
> description [2]); clients can always filter out what they don't need.
>
> To facilitate ordering, we need a capability to label the different
> properties with short names; this is similar to (maybe the same as)
> the requirement for JSON renditions of RDF, so it would make sense
> to use the same kind of rules (ie default to the local name of the
> property, provide configuration to override).
>
> # Representations #
>
> There are four kinds of representations that are particularly useful
> for lists:
>
> * RDF (RDF/XML, Turtle), obviously, for semweb heads
> * a feed (Atom or RSS) for humans to subscribe to
> * HTML for humans to look at
> * JSON of some description for normal developers
>
> The first two could be hit through using RSS 1.0 to describe the
> list. Does anyone have any thoughts about whether that would be the
> best approach? I think the only thing that makes me hesitate is the
> use of rdf:Seq to list the items, which I gather has fallen out of
> vogue.
>
> ---
>
> Anyway, hopefully Dave will set up a Google Code project where we
> can try to spec some of this out and maybe get some implementations
> in place.
>
> Any thoughts welcome!
>
> Jeni
>
> [1]: http://spinrdf.org/sp.html
> [2]: http://n2.talis.com/wiki/Bounded_Descriptions_in_RDF#Labelled_Concise_Bounded_Description
> --
> Jeni Tennison
> http://www.jenitennison.com
>
>
Received on Sunday, 13 December 2009 21:59:58 UTC