Re: URIs, used in RDF, that do not have associated documentation from トーレ　エリクソン on 2012-03-27 (www-tag@w3.org from March 2012)

From: トーレ　エリクソン <tore.eriksson@po.rd.taisho.co.jp>
Date: Tue, 27 Mar 2012 09:49:07 +0900
To: Jonathan A Rees <rees@mumble.net>
Cc: www-tag@w3.org
Message-Id: <20120327094907.AA1A.9D98B4E7@po.rd.taisho.co.jp>
Jonathan,

I hope you don't mind me repeating my offline response to your other
mail here as well.

On Tue, Mar 27, 2012 at 12:42 AM, Jonathan A Rees <rees@mumble.net> wrote:
> The question arises often: What are examples of RDF in the wild, where
> URIs are used that do not have associated documentation (i.e. RDF that
> tells you what the URI refers to)?  That is, what are some situations
> where the httpRange-14(a) rule might apply in practice - where linked
> data meets the non-RDF Web, so to speak?

And I would like to check if this lack of RDF documentaion really is
a problem. If not, one of the justifications for httpRange-14 is
nullified.

> Remember that I've stated my dismay that httpRange-14(a) says "is an
> information resource" rather than addressing the ambiguity mentioned
> in Fielding's email (as illustrated by the Flickr and Jamendo cases).
> httpRange-14(a) as written doesn't really help, except that its
> authors and nearly everyone else have interpreted it to resolve the
> ambiguity in a particular way - that the URI refers to what you
> retrieve (generically if you will), not to what is described by what
> you retrieve. This interpretation *has* been helpful because it lets
> you use these RDF-less URIs in RDF and be understood. That the
> resolution didn't say what was meant was a colossal screwup IMO. But
> let's set that aside and just look at the question.

Saying that the URI refers to what you retreive is not consistent with
the HTTP specification, in which resources and representations are
clearly seprate entities. Saying that they are equivalent confounds them,
and one of the axioms of RDF is that two things should not use the same
URI.

> I don't have the tools on hand to answer this very satisfactorily. I
> hope someone with access to good infrastructure will study this
> question. I will just give the examples that come to me off the top of
> my hand.

And I'll try to argue that the lack of descritive RDF is not a practical
problem in any of these cases.

> If you look at, say,
> http://dbpedia.org/page/Paris,
> you find many RDF statements in which the object of the statment is
> given as a URI for which
> there is no descriptive RDF. Most notably we have the target of the foaf:page
> relation, but also thumbnail, wikiPageExternalLink, website, etc.
> Since this is true of every dbpedia page, we immediately have quite a
> few such URI occurrences. If this use of URIs were called into question, then
> dbpedia would have quite a bit of rewriting to do.

The wikipedia example is perfectly coherent without the IR semantics, nothing
is invalidated nor breaks when the class of the target resources is
unknown. If you add a range of foaf:Image to dbpedia-owl:thumbnail, you
would get semantics much more useful than the fact that the resource is
an IR.

> Any FOAF page that has homepage, publications, etc. (where the target
> lacks its own RDF, which is the normal case) would be affected.

As above, it would not be affected. The assertions work even without
target descriptions.

> License assertions are affected. It turns out the CC licenses have
> embedded RDFa that could be taken as documentation of the license URI;
> you might argue that this RDFa sort of implies that the URI refers to
> the
> license retrieved from the URI, not from httpRange-14(a). But I don't
> buy this. The RDFa doesn't really provide enough
> information to say it's the retrieved license itself, as opposed to
> some other resource. That is, there's no way to distinguish the
> license case from the Flickr case, so the fact that there is RDFa
> doesn't really help. It's httpRange-14(a) and the (poorly justified)
> resulting assumption that URIs generally refer to what is retrieved
> that makes this work, not the metadata.

The subject of RDF extracted from the embedded RDFa is 
  <http://creativecommons.org/licenses/by/1.0/>
Form where I'm standing, this looks like perfectly ordinary URI
documentation. What makes this work is the explicit @about header in the
<head> element, and has absolutely nothing to do with retrieval.
Consider that it would even work if somone sent the HTML document to me
by e-mail, which is excellent. There are some triples that are incorrect
though:

<http://creativecommons.org/licenses/by/1.0/>
xhv:stylesheet
<http://creativecommons.org/includes/deed3.css> .

Thisstatement confounds the HTML document with the license. Unfortunately
the current RDFa spec also doesn't separate the resource and the
representation correctly. This is something my proposal attemts to fix
though.

> I could hunt around for uses of Dublin Core metadata where the subject
> of the statements has no accompanying descriptive RDF. I'm sure
> they're out there. Remember that this was one of the first use cases for RDF..

If the subject has DC meta data then clearly it has descriptive RDF.
What am I missing here?

> POWDER should be similar to Dublin Core but I have no pointers to
> POWDER deployment. (POWDER's predecessors were *the* first RDF use
> case, if I understand the history correctly.)

I would need some specific examples of POWDER breaking as well.

> Any use of the RDFS vocabulary is going to be full of such URIs. Look
> at the target of almost any rdfs:isDefinedBy assertion and the target URI
> will usually fail to have descriptive RDF. E.g. see
> http://www.w3.org/1999/02/22-rdf-syntax-ns - it has a bunch of these
> assertions.

Same as above, targets aren't required to provide their own definitions.

> I would be willing to bet that any URI used to name an RDF graph - as
> one finds in, say, SPARQL - has no adequately descriptive RDF. The assumption is
> always that the URI refers to the graph that a retrieved
> representation serializes. (The graph/serialization confusion is a
> wart, but the graph is similar enough to its serializations that I'm
> willing to overlook the sloppiness here. But in any case - that
> confusion is a *different* question, there is no
> descriptive RDF for the URI, so this is an example.)

This problem goes away if you have named graphs that are HTTP resources.
The named graph is the resource and the serialisation is the
representation. Changes in the graph can be delt with through the usual
HTTP headers that describe which representations are valid. Anyway, the
non-existence of descriptive RDF is not a problem.

> Additional examples welcome in followup.

Please.

> I am not aware of any instance (in the URI-refers-to-what-you-GET
> case) where, even when embedded RDF (RDFa usually) is provided, it is specific
> enough to rule out the possibility that the URI refers to some
> resource other than the one whose instances
> are retrieved. Usually this is a reasonable assumption, but any
> change that says you should not make such assumptions is going to be
> disruptive. It would be better, IMO, to codify the assumption
> (which is currently not written down anywhere) somehow, than to negate
> it.

Sorry, but this sounds like a straw-man to me. The URI refers to the resource
it denotes, thats all. The representations can describe the resource by
containing or linking to RDF which mentions the URI. It really is this
simple!

> Many of the change proposals that have come in, such as Jeni's (I
> think), do *not* say that you should not assume that these URIs refer
> to what is accessed. (Reread that if you have to.) But some of them
> do, and many raise the spectre of the ISSUE-14 screwup, and I'm
> disappointed no one's trying to fix. (Of course I didn't ask people to
> fix it, so that's my screwup.) (And I haven't read all submissions in
> detail so maybe someone does propose to fix it and I haven't
> discovered that yet.)

If you have anything that breaks, I'm all ears but your examples convinced
me that my proposal is robust as it is.

> I am certainly sympathetic to the arguments against this phenomenon in
> principle. Most people just write these URIs without much thought as
> to what they refer to or why, and there could be cases where the
> intended meaning is not correctly expressed or understood. They don't
> think about content negotiation or change over time or the possibility
> that the URI might be interpreted as referring to what is described
> rather than the description (or whatever is retrieved) or that the
> "URI owner" might think the URI refers to something else. Amazingly
> the overall result is, in my opinion, quite consistent and useful, in
> spite of the opportunities for failure.

This is the amazing thing about the Web and the HTTP protocol! The
community shoud help people think about the sematics of their URIs and
codify them in RDF. Unfortunately, current best practice makes it harder
then necessary.


Regards,

Tore Eriksson
Received on Tuesday, 27 March 2012 00:49:37 UTC