Re: XBL Namespace uses the data: scheme

Without making the argument myself, I'll point you to some relevant
information:

http://www.w3.org/Provider/Style/URI
We have so much material that we can't keep track of what is out of date and
what is confidential and what is valid and so we thought we'd better just
turn the whole lot off.

That I can sympathize with - the W3C went through a period like that, when
we had to carefully sift archival material for confidentiality before making
the archives public. The solution is forethought - make sure you capture
with every document its acceptable distribution, its creation date and
ideally its expiry date. Keep this metadata.

http://www.nsf.gov/pubs/1998/nsf9814/nsf9814.htm

Looking at this one, the "pubs/1998" header is going to give any future
archive service a good clue that the old 1998 document classification scheme
is in progress. Though in 2098 the document numbers might look different, I
can imagine this URI still being valid, and the NSF or whatever carries on
the archive not being at all embarrassed about it.
...
So what should I do? Designing URIs

It is the the duty of a Webmaster to allocate URIs which you will be able to
stand by in 2 years, in 20 years, in 200 years. This needs thought, and
organization, and commitment.

URIs change when there is some information in them which changes. It is
critical how you design them. (What, design a URI? I have to design URIs?
Yes, you have to think about it.). Designing mostly means leaving
information out.

The creation date of the document - the date the URI is issued - is one
thing which will not change. It is very useful for separating requests which
use a new system from those which use an old system. That is one thing with
which it is good to start a URI. If a document is in any way dated, even
though it will be if interest for generations, then the date is a good
starter.
What to leave out

Everything! After the creation date, putting any information in the name is
asking for trouble one way or another.
Topics and Classification by subject

I'll go into this danger in more detail as it is one of the more difficult
things to avoid. Typically, topics end up in URIs when you classify your
documents according to a breakdown of the work you are doing. That breakdown
will change. Names for areas will change. At W3C we wanted to change
"MarkUp" to "Markup" and then to "HTML" to reflect the actual content of the
section. Also, beware that this is often a flat name space. In 100 years are
you sure you won't want to reuse anything? We wanted to reuse "History" and
"Stylesheets" for example in our short life.

...

Effectively, when you use a topic name in a URI you are binding yourself to
some classification. You may in the future prefer a different one. Then, the
URI will be liable to break.

A reason for using a topic area as part of the URI is that responsibility
for sub-parts of a URI space is typically delegated, and then you need a
name for the organizational body - the subdivision or group or whatever -
which has responsibility for that sub-space. This is binding your URIs to
the organizational structure. It is typically safe only when protected by a
date further up the URI (to the left of it): 1998/pics can be taken to mean
for your server "what we meant in 1998 by *pics*", rather than "what in 1998
we did with what we now refer to as *pics*."

...
Conclusion

Keeping URIs so that they will still be around in 2, 20 or 200 or even 2000
years is clearly not as simple as it sounds. However, all over the Web,
webmasters are making decisions which will make it really difficult for
themselves in the future. Often, this is because they are using tools whose
task is seen as to present the best site in the moment, and no one has
evaluated what will happen to the links when things change. The message here
is, however, that many, many things can change and your URIs can and should
stay the same. They only can if you think about how you design them.

Received on Thursday, 29 June 2006 18:38:57 UTC