Re: URIs, used in RDF, that do not have associated documentation

Jonathan A Rees wrote:
> I think I now understand the proposal, and I respect the idea, and
> don't want to argue its merits. I am just trying to provide ways that
> the proposal can be improved. One way would be to be clear about what
> you mean by a "description", which is absolutely central to the
> proposal. Based on what you have written I can't really make sense of
> what you mean by "description" or "describe" (and we could get into a
> terminology squabble here; I am not interested in fighting about the
> use of the word, I just want the proposal to be clear about what you
> mean). 

I'll try to clarify this.

> Another way is to list under "risks" or "negatives" that
> certain common uses of URIs in RDF would be deprecated - namely the
> number of URIs used in extant RDF for which there is no discoverable
> documentation (say this however you like: RDF describing what the URI
> refers to, RDF defining or documenting the URI, etc.). I gave some
> examples in the message that started this thread.

I'm sorry, but I don't understand why not having discoverable
documentation deprecates their use. Wouldn't this deprecate a number of
non-http URIs as well? I'm thinking of the urn scheme, and possible
others like tag and uuid. Anyway, since I'm convinced that this is not a
problem, I won't list it  under risks, but I'll try to comment on this
matter in some way.

Not being sure of what the resource is, is something inherited from the
world wide web, you can never be sure that a link target (you don't own)
will always return the same thing. This is inherent in the distributed
nature of the web.

> Also I don't think the change proposal details (document changes)
> actually implement what you are trying to accomplish. In the hashless
> general case, we already *know* that the response is a nominal
> representation - this is a consequence of HTTP so is nothing new. What
> you want to add is that the response either is/contains a description
> of what's identified, or, if it does not, then we know nothing about
> what the URI refers to, not even anything about its content or
> instances, and (by implication) one had better avoid using it in RDF,
> absent additional explanation.

I agree about everything except for the last implication. I'll try make
this more explicit in the text.

> Remember, I am *not* defending httpRange-14(a) as written or the
> baseline document, which has the same problem as what you have written
> - there is no way to learn anything useful about what's identified, so
> if all you have is the specifications on hand, good practice would be
> to *never* use 200 URIs in RDF, without additional explanation not
> obtained using GET (such as hasInstanceUri). I am just saying that, to
> make *me* happy, any modification of httpRange-14(a) ought to solve
> the important problem, which is that you ought to be able to learn
> *something* useful about what the URI identifies by doing a GET,
> however weak. Knowing that the result is a representation, or that the
> referent is an information resource, is not useful (the Flickr
> example), but knowing (or being allowed to expect) that it has the
> retrieved representation as content would be useful (as demonstrated
> empirically), or alternatively knowing that's it's a description would
> also be useful, as you suggest. But one can't deduce this from the
> change you propose.  Unless the proposal says that what you GET is a
> description, or a "nominal" description or should be taken to be a
> description if it looks like one or anything of that ilk, it won't
> accomplish your purpose.

My purpose is only to document how you can find RDF mentioning a URI
starting from the URI. In the best case the RDF might contain something
useful, but that is up to the provider of the RDF. Still, in a lot of
cases you won't learn anything by doing a GET on a random URI. This is
the state of the web today. There still are a lot of very useful URIs
with accompanying explicit RDF on the web though. We should work on
increasing this amount a step at the time, not try to take a short-cut
by defining all RDF-less URIs without consulting the URI owner. Sorry
about the rant. Anyway, I'm afraid my proposal won't make you happy.

> Let's just look at one example. Do a GET of the URI
> http://purl.org/dc/elements/1.1/ and you will get some RDF.  In it you
> find this statement:
> 
> <http://purl.org/dc/elements/1.1/subject>
>     <rdfs:isDefinedBy>
>     <http://purl.org/dc/elements/1.1/>.
> 
> To understand this you at least need to *something* about what the URI
> http://purl.org/dc/elements/1.1/ refers to. Of course you learn
> something about by reading the RDF at hand, but this will not always
> be the case. Let's see what one might find out from a GET:
> 
> httpRange-14(a) only tells you it's an information resource - it could
> be any IR at all. That's not particularly useful.
> 
> Everyone seems to understand that it means an information resource one
> of whose instances is what GET http://purl.org/dc/elements/1.1/
> retrieves, even though this is not written down anywhere in any
> specification. I have suggested codifying this somehow, because I
> think it's useful (even if you aren't sure what the *other*
> representations are - in this particular case I'd be willing to bet
> that there aren't others, at least not at present - but no matter).
> But so far I haven't convinced anyone else; perhaps I am wrong; that
> doesn't matter here.

This is a very interesting example. Doing a GET on
<http://purl.org/dc/elements/1.1/>
returns 302 a redirect to
<http://dublincore.org/20120/10/11/dcelements.rdf#> (note the hash)
Another GET gives us a application/rdf+xml document.

It will be interesting how this works out in the other proposals.
httpRange-14 doesn't say *anything* about this case, and leaves
<http://purl.org/dc/elements/1.1/>
undefined (it is not an IR as you supposed it was, and it is apparently
not in the complement to IR even though I can't really follow that logic),
and if this deprecates your RDF statement - which is part of the
explicit RDF graph we just retrieved, thing get complicated.

In my proposal what
<http://purl.org/dc/elements/1.1/>
means is irrelevant to learning more about
<http://purl.org/dc/elements/1.1/subject>.
You would access
<http://purl.org/dc/elements/1.1/>,
try to find some RDF, and look for more statements containing
<http://purl.org/dc/elements/1.1/subject>.
If you want to know more about
<http://purl.org/dc/elements/1.1/>
you have to start with *this* URI and redo the algorithm. In this case
we already have the RDF graph extracted from its representation cached,
and we can tell that it has a dcterms:title, dcterms:publisher, and a
dcterms:modified date. Unfortunately, it doesn't have a rdfs:Class. Should
the statement be deprecated? I don't think so. If I really cared I'd
probably contact Dublin Core and ask them to add the statement

<http://purl.org/dc/elements/1.1/> rdf:type owl:Ontology .

> According to the baseline after your change proposal is applied,
> nothing at all is said about this case - the GET gives no information
> at all about what that URI refers to. In my view that would say that
> there is something inadequate ("bad practice") about that RDF and
> something should be added to it to make its intent clearer - like a
> hasInstanceUri assertion, for example.

Like
<http://purl.org/dc/elements/1.1/>
:hasInstanceURI
<http://dublincore.org/20120/10/11/dcelements.rdf#> .

Or perhaps
<http://purl.org/dc/elements/1.1/>
:hasInstanceURI
<http://dublincore.org/20120/10/11/dcelements.rdf> .

You could, but I don't see the point of this.

> According to what you seem to be saying, you would like for what is
> retrieved to be taken as a description of what
> http://purl.org/dc/elements/1.1/ refers to. Maybe this is obvious, but
> I think it's pretty important for a specification to state the
> obvious, especially in these very murky waters. So that's what I'm
> recommending you do.

Yes, I'll try to make this clear since it is surprisingly hard to get the
point across.

> As for whether the change puts large amounts of deployed RDF at risk,
> I am thinking about this one, and have not yet really decided what I
> think. It seems to me that it does, but I agree with you that it was
> already at risk, so the question is exactly what risks and benefits,
> exactly, we're talking about; especially what explains successful uses
> of description-free (or description-deficient) URIs in RDF - if indeed
> there are any.
> 
> This is really the same as the question of what makes the Web work at
> all, whose answer is not clear. It has something to do with pervasive,
> unjustified speculation or gambling that what the next person GETs
> will be similar enough to what you GOT, for the purposes at hand, that
> it's worth putting a URI in an href=, as opposed to holding out for
> some more rigorous and reliable form of reference, of the sort a
> librarian might be happy with. Logically, the Web shouldn't work, yet
> somehow it does. It's very odd.

Exactly! I don't know either but I think it has to do with OWA and monotonicity
;-)

Thanks a lot for all your comments. I'm finally starting to understand
which points people are concerned about, and which points that will only
lead to unrelated discussions.

I'll revise my proposal and send a final version later today.

Tore

Received on Thursday, 29 March 2012 00:59:12 UTC