Re: Perspective on the metadata / discovery struggle from Jonathan Rees on 2011-06-30 (www-tag@w3.org from June 2011)

From: Jonathan Rees <jar@creativecommons.org>
Date: Thu, 30 Jun 2011 16:29:04 -0400
To: Dave Reynolds <dave.e.reynolds@gmail.com>
Cc: www-tag@w3.org
Message-ID: <BANLkTinc_v3c+m_+qZb=SuzZOiAqCXQ9HQ@mail.gmail.com>
On Thu, Jun 30, 2011 at 1:13 PM, Dave Reynolds
<dave.e.reynolds@gmail.com> wrote:
> [Apologies if responding here isn't appropriate, not sure of the
> etiquette of this list.]
>
> On Thu, 2011-06-30 at 10:32 -0400, Jonathan Rees wrote:
>> I had a thought about the TAG definition discovery and metadata
>> architecture issues that might be helpful.  Probably this is obvious, but
>> it wasn't to me so I thought it was worth writing down.  This relates
>> to the fact that whenever the httpRange-14 thing comes up in the TAG
>> we are confused about what issue to put it under. I was inspired to
>> think this over by F2F remarks of Larry's about the magnitude of
>> the problem.
>>
>> There are two distinct application-level communication needs:
>>
>>   1. web metadata - when I express information about a document
>>      (image, etc.) how do I say (especially in RDF) that what I am
>>      talking about is content that's accessed via a particular URI, as
>>      opposed to other content
>>
>>   2. definition discovery - given a vocabulary term (URI),
>>      how is definition-like information for it discovered
>>      (Definitions are not, in general, metadata.)
>
> For people interested in linked data I don't think #2 captures what they
> are about. They want to use a URI to denote some "thing" (including
> concepts, real world entities, measurements, data sets ...) and be able
> to get back some assertions about that thing when they dereference it.
> Those assertions don't necessarily constitute a *definition* and the
> URIs certainly aren't limited to vocabulary terms.

Thanks for your note.

This is completely consistent with what I meant, so next time around I
will try to say it better.

That is, I interpret "vocabulary term" quite inclusively, and I don't
consider "definition" and "other information also" to be exclusive.

Sorry to be confusing.

> And the Ian Davis "back to basics" proposal is essentially "the owner of
> the URI gets to choose, it is their real estate, if they say this is a
> high rise then it is a high rise - if they want to create two related
> bits of real estate, one for the high rise and one for the wetland
> that's fine too".

Well, as you know, there's no general agreement right now on what you
say, but maybe there will be in the future. The point is that URIs are
used in communication, and anyone encountering a URI may want guidance
on how it's being used, or how to use it. Communication of any kind
requires prior agreement between sender and receiver, and the
it's-up-to-the-publisher rule is only one of many possible agreements.
(For example, the data: and mid: schemes do not use
up-to-the-publisher.) If that's the rule that's to win, then we need
consensus on that approach, and on what the protocol details are.
Because some companies are now acting unilaterally, we may have lost
what little consensus we had, and all bets are off when two agents
communicate - they'll use different rules and get the wrong answer in
some cases. Either we give up completely, and let the buyer beware, or
we codify *something* and try to get consensus on it.

The publisher-decides rule has many variants, and I've seen its
proponents embrace a variety of mutually incompatible positions. So
it's not enough to strengthen or retract httpRange-14 - there has to
be consensus and a spec of some kind no matter what, if there's to be
interoperability.

I don't get what you're saying about real estate. Someone encountering
a URI in an RDF statement (such as a statement of authoriship,
license, or 'likes') may want to know what it's about.The answer will
be different under different rules. To avoid getting the wrong answer
they may need to know which rule applies. Guidance comes from
specifications and general practice. If you allowed two rules to
coexist (e.g. it's-the-information-resource and publisher-decides),
there would be no way for the decoder to know which encoding was used.
See my xhv:license example in the other thread.

It doesn't help for the publisher to decide anything, if (a) it's not
generally recognized that it has that authority, (b) we don't know how
to learn from the publisher what it decided, (c) we can't distinguish
cases in which the publisher decides from those in which it doesn't.

Harry Halpin has suggested that ties like this be tolerated,
recognized, and if necessary broken by providing additional
information. I'm open to this solution, too, if it leads to some
procedure that is reliable. It is just another rule (one that combines
two or three other rules). All I'm trying to say is that there is a
real coordination problem here, it will take more than just unilateral
designs to fix it, and that strategy and process discussion may be
more important at this point than technical debate.

Jonathan
Received on Thursday, 30 June 2011 20:29:40 UTC