Re: requesting feeback about urls for a library catalog from Alexander Johannesen on 2011-06-08 (semantic-web@w3.org from June 2011)

From: Alexander Johannesen <alexander.johannesen@gmail.com>
Date: Thu, 9 Jun 2011 09:15:58 +1000
To: Alexander Johannesen <alexander.johannesen@gmail.com>, semantic-web@w3.org, adrien.dimascio@logilab.fr
Cc: adim@logilab.fr
Message-ID: <BANLkTi=s_pCwHtA8ncqT-hYumGFovw5kBQ@mail.gmail.com>
Hiya,

Ok, I think I understand what you're trying to do. Here's my .2 AUD ;

Nicolas Chauvat <nicolas.chauvat@logilab.fr> wrote:
> http://data.thelibrary.com/1234/victor_hugo is the url of a document
> that describes the person, describes its work, and links to other
> documents that provide detailed information about the two.

The first thing that strikes me is that you've created an id scheme
that will rather easily break, and the added human notion of a
readable form does not, in fact, do what you want, and a few other
examples will point this out ;

   http://data.thelibrary.com/1234/victor_hugo
   http://data.thelibrary.com/1234/hugo
   http://data.thelibrary.com/1234/frank_herbert
   http://data.thelibrary.com/2345/victor_hugo
   http://data.thelibrary.com/1234/victor_marie_hugo

Which one is the correct one? And more importantly, why? Not "because
we say so", but technically. :) What will happen at each instance? How
will you deal with the discrepancies? And notice that each of these
are their own identifier, their own URI, and there's nothing in the
description of "more readable" that denotes this kind of use.

In other words, there are canonical URIs that have a similarity to
other URIs, but it's not clear which are canonical, and I suspect, for
me, the biggest problem is that you're already using an identifier
<id> which seems unambiguous, and then you slap an added identifier on
top which may break the first, or create ambiguity.

> http://data.thelibrary.com/1234#foaf:Person is the url of the person
> itself.

Ok, I'm allergic to identifiers using anchors (fragment identifiers),
and the most important reason is that the anchor is only part of the
URI for the client, not the server. For you on the server side, the
following URIs are the *same* ;

   http://data.thelibrary.com/1234
   http://data.thelibrary.com/1234#foaf:Person

... unless when technology ignores standards, and that's not a
practice I can recommend. In a web app that happens in the browser
this is probably fine, but we're talking about identifiers here that
should be the same on the client as well as the server.

>> Are you doing the full FRBR monty, or just a few select?
>
> A good part of it.

Ok, I'm assuming groups 1 and 2 (but group 3 possibly in the future?).
Then I'd suggest an URI scheme closer to ;

   http://data.thelibrary.com/person/1234        (canonical /person/ id)
   http://data.thelibrary.com/person/victor_hugo       (alias, with redirect)

Persistent identification shouldn't be reliant on text that could be
subjective. Make your identifiers completely ambiguous, and everything
else aliases of that. Then image that concept for the following URIs
as well ;

   http://data.thelibrary.com/corporate_body/34vb5234785v
   http://data.thelibrary.com/work/4567
   http://data.thelibrary.com/expression/3467345753
   http://data.thelibrary.com/manifestation/345f908n345n340985n345
   http://data.thelibrary.com/item/345v34v54

Typification is interesting when we build URI schemes, and I prefer to
denote type in order to split of both load and help dis-ambiguity
across the application, and then have further retain an internal
structure of canonical id's to which all other id's are aliases ;

   http://data.thelibrary.com/id/4567   (canonical form for
http://data.thelibrary.com/work/4567)

>> What does readable mean?
>
> A url like <data>/1234 will not help you or me figure out what might
> be the GET-able document about.

You're getting into dangerous waters when you meld semantics and
language constructs into an identification scheme, so I would do as
explained above; have internal and external identifiers that are
ambiguous, and create an alias scheme. (This is important for two
reasons; future changes, and human stupidity.)

> I know Victor Hugo is a french author, thus <data>/1234/victor_hugo
> tells me I will GET a document about that person.

Yes, but maybe only you know this, your naming scheme here adds only
confusion to the identified item. And again note that you've already
got an identifier in there. As pointed out at the top, what happens
with these two ;

<data>/1234/victor_hugo
<data>/2345/victor_hugo

The dis-ambiguity happens at a part of the identifier that is more
important than the part that you have introduced, even if that one
seems easier to deal with (to your subjective eyes).

>
>> > <data>/1234 redirects via HTTP 303 to <data>/1234/readable_name
>>
>> Why?
>
> Because cut-n-pasting a url like <data>/1234#foaf:Person into your
> browser will get you a document describing that person.

Sorry, I didn't understand this one.

>> > <data>/1234/readable_name redirects via HTTP 301 to <data>/1234/readable_name/
>>
>> Why?
>
> Because we want a single url for the generic document.
>
> Inspiration came from Apache that does a 301 when serving a directory:
> somedir -> somedir/ -> chains with content negotiation.

Yes, but remember why they do this, and where this came from. If you
really want to have a method of making URIs denote better the
uniformness of resources, go the other way instead. A trailing slash
is a left-over (IMHO) from browsing structure *based* on the URI
(which in my world is a no-no). The slash means something akin to
'children' or 'content' of the same URI without that slash (again,
IMHO), but if you mean to identify a thing, you want to point to the
thing, not its content or children. I'd prefer the canonical URI to be
;

   <data>/1234/readable_name

rather than

   <data>/1234/readable_name/

> You are right the above does not come out of the blue, here is an
> example before you start condemning and berating and wincing :)
>
> http://viaf.org/viaf/9847974/#Hugo,_Victor,_1802-1885
> http://viaf.org/viaf/9847974/rdf.xml

Notice that the anchor here isn't used as an identifier, nor is the
name or dates part of it, or the canonical short name form. There's
the problem still of the trailing slash as part of the identifier (in
the viaf.org system), but I guess people can argue about that one. :)

> in the latter, you will read uris like
>
> http://libris.kb.se/resource/auth/206651#concept
> http://viaf.org/viaf/sourceID/SELIBR%7C206651#skos:Concept
> http://viaf.org/viaf/9847974/#foaf:Person
> etc.

So are you saying that because others are doing it wrong, that makes
it ok for you to do it wrong, too? :)

> There is also a lot to read at http://www.w3.org/2005/Incubator/lld/
> including a mailing-list on which I could post my question, but my
> goal was to get feedback from the "general semantic web public" rather
> than talk to people who have been working on exactly the same topic
> for the past year.

This is a good thing to do; ask around to as many people as you
possibly can. I feel especially the library sector could do well with
a) listening to outside parties in order to avoid the whole "invented
somewhere else" syndrome, and b) share their expertise in their field
with the rest of us which we sorely need.

I hope what I've written above is taken in the spirit it was given, as
from a person who's worked in both camps for many years and seen lots
of library infra-structure rendered half of the potential it could
have been. :) The funny part is that ILS already have good
identification schemes (internal; they suck at external schemes, even
LoC and OCLC) that should be externalized in a uniform way (starting
with proper identifiers for global library institutions and their data
sets ... there's some, but far from uniform nor extensive nor
canonical), and these things are vital if libraries hope to merge and
blend with other semantic data out there. Karen Coyle has done some
great work with RDF-ing FRBR (can't remember any URIs) which you
probably want to chase up for the backend modeling issues. I'd hope
they could in fact take a lead in creating proper identification
schemes and become a canonical governing body, but I suspect that boat
has sailed, and that aspect is sadly missing from both FRBR and RDA
body of works.

Anyway, hope some of what I've said makes any sense. :)


Kind regards,

Alex
-- 
 Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
--- http://shelter.nu/blog/ ----------------------------------------------
------------------ http://www.google.com/profiles/alexander.johannesen ---
Received on Wednesday, 8 June 2011 23:16:26 UTC