Re: How To Handle WebIDs for (X)HTML based Claim Bearing Resources from Mo McRoberts on 2011-12-30 (public-xg-webid@w3.org from December 2011)

From: Mo McRoberts <mo.mcroberts@bbc.co.uk>
Date: Fri, 30 Dec 2011 09:07:15 +0000
To: Jürgen Jakobitsch <j.jakobitsch@semantic-web.at>
Cc: Kingsley Idehen <kidehen@openlinksw.com>, WebID XG <public-xg-webid@w3.org>
Message-Id: <821E779A-61E2-4827-ADD6-F3B51DA6BD47@bbc.co.uk>

On 29 Dec 2011, at 23:26, Jürgen Jakobitsch wrote:

> hi,
> 
> what i can tell from many discussions with customers is that
> people are kind of picky about their uris.
> the "cool uri" meme is often the only thing of interest, no matter
> what technical advantages there are.

“Cool URIs”[0] mostly means “don’t include transient cruft in your URIs which means you’re very likely to need to change them, causing pain for either you or your users”.

Do you mean (so-called) “search engine-friendly”[1] URIs? (not that the two necessarily have to conflict)

We do get quite a bit of pushback about our URIs on http://www.bbc.co.uk/programmes because they use opaque identifiers which result in URIs like this:

http://www.bbc.co.uk/programmes/b006m86d#programme

instead of:

http://www.bbc.co.uk/programmes/eastenders#programme

The rationale is that the URIs are generated automatically — which will come as no surprise to anybody — and the source data is often imperfect; making the URIs “friendlier” runs the risk of them needing to change when the data is corrected (and so mechanisms have to exist to support that, with an associated cost), or being out of step with the corrected data (unpalatable to editorial people), or that the 'sanitised' form of the title used to generate the URI results in something unfortunate.

A notorious example of the latter came about on the BBC Recipes site a year or so ago — it took the 'sanitised title' approach — a recipe entitled “Carrots glazed in cumin and orange” was sanitised to an identifier truncated after the “m” in “cumin”, resulting in quite a rude-looking URI.

However, all of this is somewhat by-the-by. The bottom line is that a published URI is a commitment to the rest of the Web: if somebody is willing to bear the responsibility of ensuring that resolution will result in a sensible response for the foreseeable future (be that a 200, a 3xx, or a 410), then they can insist on picking whatever crazy mixed-up scheme they like, so long as it works. We’ve all done it at some point or another…

M.

[0] http://www.w3.org/Provider/Style/URI
[1] http://www.sitepoint.com/search-engine-friendly-urls/

-- 
Mo McRoberts - Technical Lead - The Space,
0141 422 6036 (Internal: 01-26036) - PGP key CEBCF03E,
Project Office: Room 7083, BBC Television Centre, London W12 7RJ

http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.

Received on Friday, 30 December 2011 09:07:51 UTC