Re: Minting URIs is bad? from Dan Brickley on 2009-02-02 (public-lod@w3.org from February 2009)

From: Dan Brickley <danbri@danbri.org>
Date: Mon, 02 Feb 2009 08:52:31 +0100
To: Richard Cyganiak <richard@cyganiak.de>
Cc: Sergio Fernández <sergio.fernandez@fundacionctic.org>, Michael Hausenblas <michael.hausenblas@deri.org>, Linked Data community <public-lod@w3.org>, void-rdfs-internals@googlegroups.com
Message-ID: <4986A63F.7040904@danbri.org>

Sergio:
> do we want to create this (artificial) URIs?

Richard:

 > You don't state any reasons against using URIs, you just say that you
 > prefer not to use them. So please clarify: What do you gain by not
 > introducing your own URI?

There are a few considerations...

One reason to be avoid creating artificial URIs is when we do not want 
to raise expectations about longevity, maintainance, for them.

Another is when we don't want to confuse others about the 'real' / main 
/ official URI, ie. we suspect the things have well known identifiers, 
we just don't know what they are. Or perhaps have other reasons 
(business, IP etc.) for not yet publishing the real identifiers.

These two cases can be addressed by providing some more minimal metadata 
about the identifiers. For example, that everything beginning 
http://tmpid.danbri.org/ is transient and may not be dereferenceable 
after 2 weeks. Some pieces of POWDER might be re-usable here.

A third case (not directly Void-related), is where the entity being 
identified is a Person or other entity that has associated social or 
business sensitivities.

If I convince the world that http://ids.danbri.org/richard_cyganiak is a 
fine identifier for the person whose personal mailbox is 
richard@cyganiak.de, then I put myself in some position of advantage 
(and responsibility) with respect to online information-linking 
regarding that person. My webserver sees every de-referencing of the 
URI. I see timing information, HTTP REFERER, HTTP USER AGENT, and more. 
I also probably have some responsibility to publish accurate (non 
libelous etc.) information. This covers both the nature of RDF claims I 
intentionally publish (eg. there have been various cases like 
http://news.cnet.com/2100-1025_3-5984880.html w.r.t. Wikipedia accuracy; 
DBpedia re-users should bear this in mind). But it also covers things 
like server security. If the server is hacked or otherwise compromised, 
the descriptions served at the URI are at risk.

Also If the URIs are http: rather than https: because someone didn't 
want to run SSL or pay an admin fee for a certificate check, the data 
service is less reliable (faked wifi access could substitute bad data, 
for eg.). For many cases on the Web, this is not a big deal. But when 
you are claiming that some URI serves as a reliable "identifier" for the 
thing it describes, there are extra layers of care and expectation to 
consider.

The authenticity aspects of this 3rd case can probably be addressed, at 
least partially, with digital signature. I have been poking around XML 
Signature lately. The privacy aspect is harder. Parties who claim to be 
publishing URI identifiers for entities such as people, businesses, or 
content owned by others, should at least have very clear 
terms-of-service and privacy policy documents. This is easier said than 
done, particularly in large or legally cautious organizations. Or with 
informal opensource-style projects for that matter.

In such scenarios, uuid:, tag: or description-by-reference 
identification practices still have some value. But I agree, everything 
goes much more smoothly when we have the luxury of a nice URI to join 
the data with!

cheers,

Dan

--
http://danbri.org/

Received on Monday, 2 February 2009 07:53:16 UTC