Minting URIs: how to deal with unknown data structures

Hello,

Some newbie questions here...

I have recently come in contact with the concept of Linked Data and I 
have become enthusiastic. I would like to promote the idea within my 
company (we specialize is geographical data) and within my country. I 
have read the excellent Linked Data book (“Linked Data: Evolving the Web 
into a Global Data Space”) and I think I am almost ready to start 
publishing Linked Data. I understand that it is important to get the 
URIs right, and not have to change them later. That is what my questions 
are about.

I have acquired the first part (authority) of my URIs, let's say it is 
lod.mycompany.com. Now I am faced with the question: How do I come up 
with a URI scheme that will stand the test of time? I think I will start 
with publishing some FOAF data of myself and co-workers. And then 
hopefully more and more data will follow. At this moment I can not 
possible imagine which types of data we will publish. They are likely to 
have some kind of geographical component, but that is true for a lot of 
data. I believe it is not possible to come up with any hierarchical 
structure that will accommodate all types of data that might ever be 
published.

So I think it is best to leave out any indication of data organization 
in the path element of the URI (i.e. http://lod.mycompany.com/people is 
a bad idea). In my understanding, I could use base URIs like 
http://lod.mycompany.com/resource, http://lod.mycompany.com/page and 
hhtp://lod.mycompany.com.data, and then use unique identifiers for all 
the things I want to publish something about. If I understand correctly, 
I don't need the URI to describe the hierarchy of my data because all 
Linked Data are self-describing. Nice.

But then I am faced with the problem: What method do I use to mint my 
identifiers? Those identifiers need to be unique. Should I use a number 
sequence, or a hash function? In those cases the URIs would be uniform 
and give no indication of the type of data. But a number sequence seems 
unsafe, and in the case of a hash function I would still need to make 
some kind of structured choice of input values.

I would welcome any advice on this topic from people who have had some 
more experience with publishing Linked Data.

Regards,
Frans Knibbe

Received on Friday, 15 April 2011 17:03:53 UTC