- From: Henry S. Thompson <ht@inf.ed.ac.uk>
- Date: Sun, 30 Oct 2011 21:40:19 +0000
- To: www-archive@w3.org
Let's enumerate hash-free absolute URI usage contexts and
constituencies:
A) URIs used in actionable contexts (either user-mediated, as in <a
href="...">, or unmediated, as in <img src="..."> or <script
href="..."/>) in order to trigger retrieval of interpretable
'documents', which are then interpreted according to their media
type. Neither requestors nor provisioners care about what they
identify, but presumably they identify whatever the media type
thereof says is the 'meaning' of the retrieved
message. Provisioners almost always report 200 or 404, occasionally
302. The division between 'documents' with presentations,
e.g. text/html, image/jpg, audio/mp3 on the one hand and
'documents' with non-directly perceivable procedural consequences,
e.g. text/css, text/javascript on the other is unknown, but,
particularly if we count types rather than tokens, probably heavily
biased in favour of the first group. The vast majority of
retrievals are done by browsers, presumably crawlers come next.
B) URIs are used in referential contexts (RDF/XML, RDFa, Turtle, N3)
to identify subjects, relations or objects of RDF triples. In
principle there could be other (non-RDF) referential contexts---we
can for example imagine a version of KLONE which uses URIs for
identifiers. Retrieval is by definition speculative, in quest of
more triples (or other context-appropriate descriptive material,
e.g. more URI-KLONE).
From the provisioning side, there is a moderately complicated tree
of cases:
1) Nothing in known: 404, no problem
2) Thing identified is not an information resource and some
description 'document(s)' is/are available
a) Return a 303 plus Location: [of the description 'doc']
(See e.g. http://dbpedia.org/resource/Albert_Einstein);
b) Return a 200 plus (one of) the description 'doc(s)',
possibly including either or both of
Content-location: [uri of IR for the description doc]
<wdrs:describedby rdf:resource=' " '/>
(See e.g. http://iandavis.com/2010/303/toucan,
http://schema.org/Thing,
http://www.uk-postcodes.com/postcode/EH125BB [Accept:
a/rdf+xml],
http://lsid.tdwg.org/urn:lsid:ubio.org:namebank:11815);
i) Return a 302 plus Location: [of a description 'doc']
(See (!) e.g. http://purl.org/dc/elements/1.1/identifier);
3) Thing identified is an information resource but only a/some
description 'document(s)' is/are available
(No known examples, but imagine e.g. some RDF about the 2020
US census report)
Same alternatives as (B2a) and (B2b)
4) Thing identified is an information resource _and_ a/some
description 'document(s)' is/are available. 200 + a
representation is the only possible result. The description
may be embedded in the representation as RDFa if the
representation is XML or HTML (see
e.g. http://sercompetitivos.com/?ibsa=share&id=1590,
http://www.somebits.com/weblog/culture/blogs/ccLicense.html),
or synthesised from one or more <link rel=...> or meta
rel=...> elements in the <head> (see
e.g. http://en.wikipedia.org/wiki/Organelle,
http://lod.geospecies.org/bioclasses/aQado.xhtml), if the
representation is HTML*. Regardless of media type other
discovery mechanisms have been canvassed, including Link:
headers, .well-known provision, etc., but I'm not aware of
_any_ examples of these in use.
How well any given tool which gets such a response does at
tracking down/locating the description varies widely, I expect.
5) Thing described is an information resource, a/some description
'document(s)' is/are available, but they describe a landing
page URI, not the URI for the resource itself. Both the
landing page and the resource are served with 200 + a
representation, which in all examples I'm aware of carries the
description embedded as RDFa (see
e.g. http://www.flickr.com/photos/62234213@N00/354736733,
[couldn't find a journal article landing page example])
Not aware of any tool which is capable of sorting out the
confusion here.
A of course hugely dominates B numerically. Within B, obviously a lot
of B5 because of flickr. B2 is LOD heartland, B3 doesn't raise any
issues that B2 doesn't. The majority of the B4 cases are harmless,
because the representation is HTML, the resource is an
ordinary-language:document and there's no other referent in the
picture. The subcase where the representation is RDF (or N3 or
Turtle...) and there _are_ two resources in play is rare (?).
Hmm. Sindice finds 300K RDF pages with cc:license statements.
First one is
http://carpictures.cc/cars/photo/car_picture/13037/grey_mercedes_rear_license_plate_blank.rdf
which actually illustrates the landing page problem, not the pun
problem. That URI yields an HTML page with a picture of a
merc. embedded in it and
<link rel="alternate" type="application/rdf+xml" title="RDF/XML
Representation"
href="http://carpictures.cc/cars/photo/car_picture/13037/grey_mercedes_rear_license_plate_blank.rdf"
/>
The RDF itself includes
<http://carpictures.cc/cars/photo/car_picture/13037/grey_mercedes_rear_license_plate_blank>
<http://creativecommons.org/ns#license>
<http://creativecommons.org/licenses/by/2.0>
And there is also
http://rdf.ecs.soton.ac.uk/degree/csInt
which illustrates a careful pattern -- that document contains
assertions about both
itself, that is http://rdf.ecs.soton.ac.uk/degree/csInt
including a cc:license and rdfs:type Ontology
as well as assertions about
what it denotes, that is http://id.ecs.soton.ac.uk/degree/csInt
including rdfs:type ...:...Degree and ...:hasCohort
And in interesting 3rd-party pun mistake turns up in
http://purl.org/derecho
where we find
<http://dbpedia.org/resource/Law>
<http://creativecommons.org/ns#license>
<http://creativecommons.org/licenses/by-nc-sa/3.0/es>
Which is wrong -- that's the concept -- the predicate is only true of
http://dbpedia.org/page/Law and http://dbpedia.org/data/Law
And here's a case where the pun is OK! That is, the predication is
true on _either_ reading. In
http://purl.org/NET/cidoc-crm/core
we have
<http://purl.org/NET/cidoc-crm/core>
<http://creativecommons.org/ns#license>
<http://creativecommons.org/licenses/by/3.0/>
_and_
<http://purl.org/NET/cidoc-crm/core>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://www.w3.org/2002/07/owl#Ontology>
But it's OK to have the license apply either to the document or to the
ontology it describes.
But finally we also have
http://squio.nl/blog/triplify/user/1
which contains
<http://squio.nl/blog/triplify/user/1>
<http://creativecommons.org/ns#license>
<http://creativecommons.org/licenses/by/3.0/us/>
as well as
<http://squio.nl/blog/triplify/user/1>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://xmlns.com/foaf/0.1/Person>
Bingo.
But I had to look pretty hard to find that (about 90 minutes looking
at Sindice results).
Sigh, this is much too long and diffuse, but it's a record of how I
spent several hours a day for the last three days. More thoughts
about all this when I can. . .
ht
* Searching for "cell organelles" with the free to use box ticked in
Google advanced search, the hits break down about 60-40 ones with <a
href="[cc license]"> in the HTML somewhere, and ones with <link
rel="license|copyright" href="...cc..."/>. The former don't count
as far as I'm concerned, since none of them are recognisable as RDFa
(they lack rel="(cc:)license").
There are relatively few <a rel="cc:license"> on any pages -- about
46K according to Sindice, somewhat fewer rel="license" -- about 17K.
--
Henry S. Thompson, School of Informatics, University of Edinburgh
10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 131 650-4440
Fax: (44) 131 651-1426, e-mail: ht@inf.ed.ac.uk
URL: http://www.ltg.ed.ac.uk/~ht/
[mail from me _always_ has a .sig like this -- mail without it is forged spam]
Received on Sunday, 30 October 2011 21:40:49 UTC