Re: Persistent Documents and Locations

At 15.41 95-08-21, lazear@dockside.mitre.org wrote:
>One problem:  who validates URNs, since they are the
>longest-lived element in the UR* universe?  Like DNS,
>one can find out that a URN is no longer valid.  The issue
>then is to find out a source from which one can get the new
>form of the URN or an alternate URN or confirmation that the
>URN doesn't exist at all anymore.  This is usually an offline
>task (like calling a friend to learn what the real domain
>name is).

When I start thinking in these terms, I remember the time when
I was working with books. I did work for a small company which
ordered all books for about 25% of the schools in sweden, i.e.
about 2000 (sweden is a small country). We ordered them from
all publishers and importers that exists in sweden, which is
about 3500. We had about 250000 titles in our register.

Anyway, the fact is that I have seen the use of ISBN during about 10
years in the early eighties, and one did actually think when
I started that a ISBN is persistent, unique and all of that,
but that is as false as the statement that the moon is made
of cheese.

The fact is this:

Each country (or "language-group") gets a ISBN prefix. For english
litterature, it's "1", for Swedish books, it's 91.

After that follows a number of a publisher (which doesn't have
to be the one that did print the book, but the one which handles the
registration). The number of the publisher is given by the
local responsible authority for the "language-group", i.e.
in Sweden (91), it's the Royal Library. For "1", I don't
know what it is. Each publisher might have several publisher
codes. The length of the publisher number (the number of digits)
is decided because of the initial part of the publisher number.
The maximum length is (8 - "length of language group") digits.

In sweden, if the publisher id starts with "1", it's only
one digit. If it starts on "2"-"4" it's two digits, "5" gives
a three digit number and so on. It is possible that "63" starts
a four digit number and "64" a five-digit one.

After the publisher number follows the number of the book,
and after that a checksum, which is a number 0-9 or the letter
"X".

An ISBN is always 10 digits (including the checksum).

All these rules gives as a result that the "-" you often see
in the number is not needed, as the length of the language-
group and the publisher-ID is given when you have read the
first digit(s).

Ok, that was the technical part.

Who is now responsible for the part which is the number of
the book? Well, the truth is that it's publisher and noone else!
If the publisher decides to change the ISBN of a book when he
is doing "just" a new print, he can do that, even though it's
more normal that the ISBN changes when they print a new
edition (not just a new print).

Some publishers follow these simple rules, and when they run out
of numbers, they ask for a new publisher-ID from, in Sweden the
Royal Library.

Sometimes, it's better for a publisher to stay within his series
of ISBNs. He do that by reusing ISBN numbers!

So, what I found is that

(1) The lifetime of a ISBN is sometimes as short as _one_ single
    print of a book.
(2) The uniqueness of a ISBN is by far not true.
(3) There is nothing like a "referral" from an old number
    to a new one, you have to call the publisher himself.

I hope that URNs will be more persistent and have longer liftetimes
than ISBNs, but I also want to include ISBNs into the URN space,
because exactly because there is problems with the ISBN architecture,
there is a need for an easier way of finding a book given an ISBN.
If we have referrals in the URN tree, i.e. automatic referrals from
one node in the URN hierarchy to another one, we have gained a lot,
and the fact is that the URN itself doesn't have to live, but it must
be replaced by a referral.

The other thing is that the publishers are responsible for the
actual ISBN number. We will end up having the same structure.
There is no way we can introduce a group of people that checks the
algorithms used when everyone in the world invents URNs. When my
email client sends mail, I don't know, and I don't care, as a user
if the URN my mail gets will be unique forever. It's when I try
to "fetch" an unknown mail I want to be able to find it, i.e.
long after I actually started to send mail myself. Who is checking
the algorithm used by the email client?

I think the answer to your question is that we all, we who implement
URNs, have each one of us to be responsible for good algorithms which
the "publisher" then can use. The "publisher" in turn is later
responsible for the resolution (he might buy or lease this part
from a third company), but there is nothing we can do if he
can not do that.

What are we doing today when a NS record points to a host that
doesn't respond? Well, we look at the SOA and send email to the
address that is there, or if that is impossible, we use Whois to
find the telephone number to him/her if we are that ambitious.

The URN space will not be better than what the responsible persons
makes it. DNS is stable just because the individual system
administrator really needs it himself. Will we have the same
cituation for URNs? I hope so, but I am afraid it will take time.

    Patrik

Received on Saturday, 26 August 1995 12:16:16 UTC