Re: URIs, used in RDF, that do not have associated documentation from Jonathan A Rees on 2012-03-28 (www-tag@w3.org from March 2012)

From: Jonathan A Rees <rees@mumble.net>
Date: Wed, 28 Mar 2012 10:41:42 -0400
To: トーレエリクソン <tore.eriksson@po.rd.taisho.co.jp>
Cc: www-tag@w3.org, tore.eriksson@gmail.com
Message-ID: <CAGnGFMLyKMvofObgDLFYm5E6VqdUan=bTU2FhZAiTqRbeensmg@mail.gmail.com>
2012/3/27 トーレ　エリクソン <tore.eriksson@po.rd.taisho.co.jp>:

>> I suspect you didn't mean what you said. You might have intended to
>> classify URIs as content-oriented vs. description-oriented (or
>> representations as content vs. description), which would be a better
>> match to current and desired practice, I think. Then representations
>> wouldn't *always* be descriptions. The question then would be where to
>> draw the line.
>
> I did mean what I said. Classifying URIs as content-oriented vs.
> description-oriented is in my opinion the root problem. What I want to
> say is:
>
> * All HTTP URIs are description-oriented, even if they return a 200 *
>
> Neither me nor Tim Berners-Lee wants to draw the line. His position is
> that all URIs responding with a 200 are content-oriented, my position is
> that they, as wella as the 303s, are description-oriented. Both
> positions have their merits and demerits. I have mentally applied my
> position to a lot of problems that have been discussed on these mailing
> lists during the last decade, and I think it works out well in most
> cases.
>
> I know it is a radical change, but this is what I am proposing. Many
> other people have propose versions of this position during the years,
> but it has never stuck. Apparently the point is hard to get across; even
> if you write it out in prose, people assume that you meant something
> else... I hope you understand what I'm trying to say now, and any advice
> on how to formulate this more clearly would be much appreciated.
>
> Tore

I think I now understand the proposal, and I respect the idea, and
don't want to argue its merits. I am just trying to provide ways that
the proposal can be improved. One way would be to be clear about what
you mean by a "description", which is absolutely central to the
proposal. Based on what you have written I can't really make sense of
what you mean by "description" or "describe" (and we could get into a
terminology squabble here; I am not interested in fighting about the
use of the word, I just want the proposal to be clear about what you
mean). Another way is to list under "risks" or "negatives" that
certain common uses of URIs in RDF would be deprecated - namely the
number of URIs used in extant RDF for which there is no discoverable
documentation (say this however you like: RDF describing what the URI
refers to, RDF defining or documenting the URI, etc.). I gave some
examples in the message that started this thread.

Also I don't think the change proposal details (document changes)
actually implement what you are trying to accomplish. In the hashless
general case, we already *know* that the response is a nominal
representation - this is a consequence of HTTP so is nothing new. What
you want to add is that the response either is/contains a description
of what's identified, or, if it does not, then we know nothing about
what the URI refers to, not even anything about its content or
instances, and (by implication) one had better avoid using it in RDF,
absent additional explanation.

Remember, I am *not* defending httpRange-14(a) as written or the
baseline document, which has the same problem as what you have written
- there is no way to learn anything useful about what's identified, so
if all you have is the specifications on hand, good practice would be
to *never* use 200 URIs in RDF, without additional explanation not
obtained using GET (such as hasInstanceUri). I am just saying that, to
make *me* happy, any modification of httpRange-14(a) ought to solve
the important problem, which is that you ought to be able to learn
*something* useful about what the URI identifies by doing a GET,
however weak. Knowing that the result is a representation, or that the
referent is an information resource, is not useful (the Flickr
example), but knowing (or being allowed to expect) that it has the
retrieved representation as content would be useful (as demonstrated
empirically), or alternatively knowing that's it's a description would
also be useful, as you suggest. But one can't deduce this from the
change you propose.  Unless the proposal says that what you GET is a
description, or a "nominal" description or should be taken to be a
description if it looks like one or anything of that ilk, it won't
accomplish your purpose.

Let's just look at one example. Do a GET of the URI
http://purl.org/dc/elements/1.1/ and you will get some RDF.  In it you
find this statement:

<http://purl.org/dc/elements/1.1/subject>
    <rdfs:isDefinedBy>
    <http://purl.org/dc/elements/1.1/>.

To understand this you at least need to *something* about what the URI
http://purl.org/dc/elements/1.1/ refers to. Of course you learn
something about by reading the RDF at hand, but this will not always
be the case. Let's see what one might find out from a GET:

httpRange-14(a) only tells you it's an information resource - it could
be any IR at all. That's not particularly useful.

Everyone seems to understand that it means an information resource one
of whose instances is what GET http://purl.org/dc/elements/1.1/
retrieves, even though this is not written down anywhere in any
specification. I have suggested codifying this somehow, because I
think it's useful (even if you aren't sure what the *other*
representations are - in this particular case I'd be willing to bet
that there aren't others, at least not at present - but no matter).
But so far I haven't convinced anyone else; perhaps I am wrong; that
doesn't matter here.

According to the baseline after your change proposal is applied,
nothing at all is said about this case - the GET gives no information
at all about what that URI refers to. In my view that would say that
there is something inadequate ("bad practice") about that RDF and
something should be added to it to make its intent clearer - like a
hasInstanceUri assertion, for example.

According to what you seem to be saying, you would like for what is
retrieved to be taken as a description of what
http://purl.org/dc/elements/1.1/ refers to. Maybe this is obvious, but
I think it's pretty important for a specification to state the
obvious, especially in these very murky waters. So that's what I'm
recommending you do.

As for whether the change puts large amounts of deployed RDF at risk,
I am thinking about this one, and have not yet really decided what I
think. It seems to me that it does, but I agree with you that it was
already at risk, so the question is exactly what risks and benefits,
exactly, we're talking about; especially what explains successful uses
of description-free (or description-deficient) URIs in RDF - if indeed
there are any.

This is really the same as the question of what makes the Web work at
all, whose answer is not clear. It has something to do with pervasive,
unjustified speculation or gambling that what the next person GETs
will be similar enough to what you GOT, for the purposes at hand, that
it's worth putting a URI in an href=, as opposed to holding out for
some more rigorous and reliable form of reference, of the sort a
librarian might be happy with. Logically, the Web shouldn't work, yet
somehow it does. It's very odd.

Jonathan
Received on Wednesday, 28 March 2012 14:42:15 UTC