- From: Henry S. Thompson <ht@inf.ed.ac.uk>
- Date: Sun, 30 Oct 2011 21:40:19 +0000
- To: www-archive@w3.org
Let's enumerate hash-free absolute URI usage contexts and constituencies: A) URIs used in actionable contexts (either user-mediated, as in <a href="...">, or unmediated, as in <img src="..."> or <script href="..."/>) in order to trigger retrieval of interpretable 'documents', which are then interpreted according to their media type. Neither requestors nor provisioners care about what they identify, but presumably they identify whatever the media type thereof says is the 'meaning' of the retrieved message. Provisioners almost always report 200 or 404, occasionally 302. The division between 'documents' with presentations, e.g. text/html, image/jpg, audio/mp3 on the one hand and 'documents' with non-directly perceivable procedural consequences, e.g. text/css, text/javascript on the other is unknown, but, particularly if we count types rather than tokens, probably heavily biased in favour of the first group. The vast majority of retrievals are done by browsers, presumably crawlers come next. B) URIs are used in referential contexts (RDF/XML, RDFa, Turtle, N3) to identify subjects, relations or objects of RDF triples. In principle there could be other (non-RDF) referential contexts---we can for example imagine a version of KLONE which uses URIs for identifiers. Retrieval is by definition speculative, in quest of more triples (or other context-appropriate descriptive material, e.g. more URI-KLONE). From the provisioning side, there is a moderately complicated tree of cases: 1) Nothing in known: 404, no problem 2) Thing identified is not an information resource and some description 'document(s)' is/are available a) Return a 303 plus Location: [of the description 'doc'] (See e.g. http://dbpedia.org/resource/Albert_Einstein); b) Return a 200 plus (one of) the description 'doc(s)', possibly including either or both of Content-location: [uri of IR for the description doc] <wdrs:describedby rdf:resource=' " '/> (See e.g. http://iandavis.com/2010/303/toucan, http://schema.org/Thing, http://www.uk-postcodes.com/postcode/EH125BB [Accept: a/rdf+xml], http://lsid.tdwg.org/urn:lsid:ubio.org:namebank:11815); i) Return a 302 plus Location: [of a description 'doc'] (See (!) e.g. http://purl.org/dc/elements/1.1/identifier); 3) Thing identified is an information resource but only a/some description 'document(s)' is/are available (No known examples, but imagine e.g. some RDF about the 2020 US census report) Same alternatives as (B2a) and (B2b) 4) Thing identified is an information resource _and_ a/some description 'document(s)' is/are available. 200 + a representation is the only possible result. The description may be embedded in the representation as RDFa if the representation is XML or HTML (see e.g. http://sercompetitivos.com/?ibsa=share&id=1590, http://www.somebits.com/weblog/culture/blogs/ccLicense.html), or synthesised from one or more <link rel=...> or meta rel=...> elements in the <head> (see e.g. http://en.wikipedia.org/wiki/Organelle, http://lod.geospecies.org/bioclasses/aQado.xhtml), if the representation is HTML*. Regardless of media type other discovery mechanisms have been canvassed, including Link: headers, .well-known provision, etc., but I'm not aware of _any_ examples of these in use. How well any given tool which gets such a response does at tracking down/locating the description varies widely, I expect. 5) Thing described is an information resource, a/some description 'document(s)' is/are available, but they describe a landing page URI, not the URI for the resource itself. Both the landing page and the resource are served with 200 + a representation, which in all examples I'm aware of carries the description embedded as RDFa (see e.g. http://www.flickr.com/photos/62234213@N00/354736733, [couldn't find a journal article landing page example]) Not aware of any tool which is capable of sorting out the confusion here. A of course hugely dominates B numerically. Within B, obviously a lot of B5 because of flickr. B2 is LOD heartland, B3 doesn't raise any issues that B2 doesn't. The majority of the B4 cases are harmless, because the representation is HTML, the resource is an ordinary-language:document and there's no other referent in the picture. The subcase where the representation is RDF (or N3 or Turtle...) and there _are_ two resources in play is rare (?). Hmm. Sindice finds 300K RDF pages with cc:license statements. First one is http://carpictures.cc/cars/photo/car_picture/13037/grey_mercedes_rear_license_plate_blank.rdf which actually illustrates the landing page problem, not the pun problem. That URI yields an HTML page with a picture of a merc. embedded in it and <link rel="alternate" type="application/rdf+xml" title="RDF/XML Representation" href="http://carpictures.cc/cars/photo/car_picture/13037/grey_mercedes_rear_license_plate_blank.rdf" /> The RDF itself includes <http://carpictures.cc/cars/photo/car_picture/13037/grey_mercedes_rear_license_plate_blank> <http://creativecommons.org/ns#license> <http://creativecommons.org/licenses/by/2.0> And there is also http://rdf.ecs.soton.ac.uk/degree/csInt which illustrates a careful pattern -- that document contains assertions about both itself, that is http://rdf.ecs.soton.ac.uk/degree/csInt including a cc:license and rdfs:type Ontology as well as assertions about what it denotes, that is http://id.ecs.soton.ac.uk/degree/csInt including rdfs:type ...:...Degree and ...:hasCohort And in interesting 3rd-party pun mistake turns up in http://purl.org/derecho where we find <http://dbpedia.org/resource/Law> <http://creativecommons.org/ns#license> <http://creativecommons.org/licenses/by-nc-sa/3.0/es> Which is wrong -- that's the concept -- the predicate is only true of http://dbpedia.org/page/Law and http://dbpedia.org/data/Law And here's a case where the pun is OK! That is, the predication is true on _either_ reading. In http://purl.org/NET/cidoc-crm/core we have <http://purl.org/NET/cidoc-crm/core> <http://creativecommons.org/ns#license> <http://creativecommons.org/licenses/by/3.0/> _and_ <http://purl.org/NET/cidoc-crm/core> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Ontology> But it's OK to have the license apply either to the document or to the ontology it describes. But finally we also have http://squio.nl/blog/triplify/user/1 which contains <http://squio.nl/blog/triplify/user/1> <http://creativecommons.org/ns#license> <http://creativecommons.org/licenses/by/3.0/us/> as well as <http://squio.nl/blog/triplify/user/1> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> Bingo. But I had to look pretty hard to find that (about 90 minutes looking at Sindice results). Sigh, this is much too long and diffuse, but it's a record of how I spent several hours a day for the last three days. More thoughts about all this when I can. . . ht * Searching for "cell organelles" with the free to use box ticked in Google advanced search, the hits break down about 60-40 ones with <a href="[cc license]"> in the HTML somewhere, and ones with <link rel="license|copyright" href="...cc..."/>. The former don't count as far as I'm concerned, since none of them are recognisable as RDFa (they lack rel="(cc:)license"). There are relatively few <a rel="cc:license"> on any pages -- about 46K according to Sindice, somewhat fewer rel="license" -- about 17K. -- Henry S. Thompson, School of Informatics, University of Edinburgh 10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 131 650-4440 Fax: (44) 131 651-1426, e-mail: ht@inf.ed.ac.uk URL: http://www.ltg.ed.ac.uk/~ht/ [mail from me _always_ has a .sig like this -- mail without it is forged spam]
Received on Sunday, 30 October 2011 21:40:49 UTC