Re: Minting URIs is bad? from Hugh Glaser on 2009-02-03 (public-lod@w3.org from February 2009)

From: Hugh Glaser <hg@ecs.soton.ac.uk>
Date: Tue, 3 Feb 2009 01:13:22 +0000
To: Richard Cyganiak <richard@cyganiak.de>, Dan Brickley <danbri@danbri.org>
CC: Sergio Fernández <sergio.fernandez@fundacionctic.org>, Linked Data community <public-lod@w3.org>
Message-ID: <EMEWEMEW2_DELIMl121DXb02ecec7f13d9e77b51be0,hg@ecs.soton.ac.uk,C5AD4AB2.29233>
Wow. A couple of great messages.
Interestingly (for me) I read Dan's message as not being antagonistic to the minting of URIs; rather as an excellent discussion of some of the issues.
(Re-reading it, I find I may have not given sufficient importance to the statements about "avoid creating artificial URIs ".)
Anyway, whatever Dan's opinion, I think I am in great agreement with Richard.

We can argue about whether minting URIs is good or bad, but in the end it doesn't matter.
URIs will be minted in vast quantities by all sorts of people, confronting all the considerations that Dan raises, and more (assuming the world we expect comes to pass). Suggesting that there can be any control of all this is like, I don't know, maybe suggesting that bad html should be rejected. These things, as Richard says, are "a fact of life on the Web".
So I read Dan's email as discussing some of the exciting challenges that we have to face, and we must solve by embracing them and building it all into our distributed systems architectures.

Failure to meet these challenges is likely to result in failure of the whole project, and others will take up the challenge and overcome it (which I guess might be OK?).

Best
Hugh


On 03/02/2009 00:44, "Richard Cyganiak" <richard@cyganiak.de> wrote:



Dan,

Executive summary: Yes, there will be crappy, worthless, unstable and
ambiguous identifiers on the Web of Data. Actually, 90% of them will
be. The correct response to this is not: "Don't mint identifiers
unless you can guarantee X, Y and Z!" The correct response is: "Don't
trust and re-use every random identifier you find."


Now for the details. You raise three points against minting URIs. Let
me paraphrase them and respond to each one.


"1. Once you mint and publish a URI, you implicitly make a commitment
to keep that URI stable, which can be costly."

Broken links are a fact of life on the Web, and experience on the WWW
has taught us pretty good heuristics for estimating the stability of
URIs, and it has taught us not to rely on a URI unless we have some
reason to believe that it will be stable. So I don't think that the
act of minting and publishing a URI in itself  creates a social
obligation for its owner towards keeping it stable. The obligation
comes from elsewhere: from your desire to be (or remain) a reputable
and reliable source of URIs, or from you advertising your dataset as a
high-quality and stable data source.


"2. Once you mint and publish a URI that names something you don't
"own", others might mistakenly believe that your URI is "official". If
they start to use it like an "official" URI provided by the owner,
then this might put unwanted expectations and obligations on you."

This can be solved by a simple RDF property stating "this document is
NOT published by X", that can be used on a document whenever there is
potential for confusion as to wether a URI is "official" or not. This
is actually a sort of nifty idea.


"3. Once you mint and publish a URI for a social entity (person,
business), and successfully advertise it as a reliable identifier for
the entity, then this puts you into a position of responsibility (and
power) to provide a high-quality, informative and secure data service
connected to the URI."

Minting a URI neither gives you much power nor much responsibility.
Only as people grow to rely on your page and start to trust you, you
acquire both. And this rarely happens accidentally, it happens when
people design for success and build great services. The chances that
some random URI which I minted to name you will acquire such
importance, are zero.


I see minting a URI as a completely casual activity, not greater in
responsibility than, say, publishing a message on Twitter or posting a
comment on a YouTube video. The argument that "introducing an
identifier into the world bestows great responsibility upon you" is
harmful in my eyes, and actually borders on FUD. Is the intent to keep
the right to mint URIs in the hand of some Select Few Who Know How To
Do It Properly? I hope not. I'd rather spread a message that
encourages people to put linkable data out there, rather than warning
them about the 53 things that they should worry about each time they
touch RDF.

Best,
Richard



On 2 Feb 2009, at 07:52, Dan Brickley wrote:

> Sergio:
>> do we want to create this (artificial) URIs?
>
> Richard:
>
> > You don't state any reasons against using URIs, you just say that
> you
> > prefer not to use them. So please clarify: What do you gain by not
> > introducing your own URI?
>
> There are a few considerations...
>
> One reason to be avoid creating artificial URIs is when we do not
> want to raise expectations about longevity, maintainance, for them.
>
> Another is when we don't want to confuse others about the 'real' /
> main / official URI, ie. we suspect the things have well known
> identifiers, we just don't know what they are. Or perhaps have other
> reasons (business, IP etc.) for not yet publishing the real
> identifiers.
>
> These two cases can be addressed by providing some more minimal
> metadata about the identifiers. For example, that everything
> beginning http://tmpid.danbri.org/ is transient and may not be
> dereferenceable after 2 weeks. Some pieces of POWDER might be re-
> usable here.
>
> A third case (not directly Void-related), is where the entity being
> identified is a Person or other entity that has associated social or
> business sensitivities.
>
> If I convince the world that http://ids.danbri.org/richard_cyganiak
> is a fine identifier for the person whose personal mailbox is richard@cyganiak.de
> , then I put myself in some position of advantage (and
> responsibility) with respect to online information-linking regarding
> that person. My webserver sees every de-referencing of the URI. I
> see timing information, HTTP REFERER, HTTP USER AGENT, and more. I
> also probably have some responsibility to publish accurate (non
> libelous etc.) information. This covers both the nature of RDF
> claims I intentionally publish (eg. there have been various cases
> like http://news.cnet.com/2100-1025_3-5984880.html w.r.t. Wikipedia
> accuracy; DBpedia re-users should bear this in mind). But it also
> covers things like server security. If the server is hacked or
> otherwise compromised, the descriptions served at the URI are at risk.
>
> Also If the URIs are http: rather than https: because someone didn't
> want to run SSL or pay an admin fee for a certificate check, the
> data service is less reliable (faked wifi access could substitute
> bad data, for eg.). For many cases on the Web, this is not a big
> deal. But when you are claiming that some URI serves as a reliable
> "identifier" for the thing it describes, there are extra layers of
> care and expectation to consider.
>
> The authenticity aspects of this 3rd case can probably be addressed,
> at least partially, with digital signature. I have been poking
> around XML Signature lately. The privacy aspect is harder. Parties
> who claim to be publishing URI identifiers for entities such as
> people, businesses, or content owned by others, should at least have
> very clear terms-of-service and privacy policy documents. This is
> easier said than done, particularly in large or legally cautious
> organizations. Or with informal opensource-style projects for that
> matter.
>
> In such scenarios, uuid:, tag: or description-by-reference
> identification practices still have some value. But I agree,
> everything goes much more smoothly when we have the luxury of a nice
> URI to join the data with!
>
> cheers,
>
> Dan
>
> --
> http://danbri.org/
Received on Tuesday, 3 February 2009 01:14:12 UTC