[ratholes, reification, risk] poison-URIref testcases from Dan Brickley on 2002-03-22 (w3c-rdfcore-wg@w3.org from March 2002)

From: Dan Brickley <danbri@w3.org>
Date: Fri, 22 Mar 2002 09:18:00 -0500 (EST)
To: <w3c-rdfcore-wg@w3.org>
cc: Pat Hayes <phayes@ai.uwf.edu>
Message-ID: <Pine.LNX.4.30.0203220813060.3166-100000@tux.w3.org>
In http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Feb/0569.html
Re: Outstanding Issues - an RDF statement is an assertion

[[
>(Whether there any cases where a syntatically wellformed RDF/XML document
>is neither true nor false, but (something like) meaningless may be worth
>some attention (in Primer, MT?). Bogus URIs, for example. Probably a
>rathole, forget I mentioned it.)

Definitily a rathole, but might be worth putting up a warning flag.
]]


So I'd like to put up a warning flag related to this, specifically the
possible impact the presence of a bogus ("poison", non-referring) URIref
might have on the truth-prospects of seemingly healthy RDF/XML documents.

I'm yet to see a W3C or IETF spec that guarantees all urirefs refer; so in
general I proceed with caution and assume that some (eg. typos, some
UUIDs) might not be names for anything. The MT explicitly disowns this
problem; it's not clear which spec(s) should own it. So I tried to write
some test cases as a start towards understanding how we might (at least in
Primer) flag up the awkward corner cases and strategies for avoiding them.

Here are two RDF/XML test cases:

[rat1.rdf]

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
	xmlns:foaf="http://xmlns.com/foaf/0.1/">
 <foaf:Person foaf:name="Jan Grant">
  <foaf:mbox rdf:resource="mailto:jan.grant@bristol.ac.uk"/>
 </foaf:Person>
</rdf:RDF>



[rat2.rdf]

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
	xmlns:foaf="http://xmlns.com/foaf/0.1/">
 <foaf:Person foaf:name="Jan Grant">
  <foaf:mbox rdf:resource="mailto:jan.grant@bristol.ac.uk"/>
 </foaf:Person>
 <rdf:Statement>
  <rdf:object rdf:resource="poison:uriref-like-string-naming-nothing"/>
  <!-- note that rdf:subject used here wouldn't be so interesting -->
 </rdf:Statement>
</rdf:RDF>


In brief, I think we need to decide whether to seek assurance from the W3C
Technical Architecture Group (TAG) that there are no circumstances in
which a URIref identifier (or URIref-formatted-string) fails to refer.
If we don't get that assurance, some RDF/XML documents might not be
carriers for propositional content (since --loosly put-- bits of them
will be meaningless).

We can think about this in terms of the test cases above, by asking
ourselves what the world would have to be like if assertions of [rat1.rdf]
and [rat2.rdf] were to be true.

If there are no worlds in which the uriref poison:uri-like-string-naming-nothing
actually refers, [rat2.rdf]'s truth prospects are dismal compared to
[rat1.rdf]'s. I believe many uses of RDF, and in particular of the
rdf:Statement, rdf:predicate, rdf:subject etc vocabulary will find this
counter intuitive to say the least. It isn't up to RDF Core, the RDF
 MT etc to rule on whether there are in fact 'poison URIs'. We need to
find out by seeking clarification from other groups (TAG, URI).

The MT spec does a fine job of avoiding such potential ratholes, through
making (very reasonable) simplifying assumptions, and through
highlighting areas that are still research topics. I'm
concerned that we make sure (perhaps by talking to the TAG) that *our*
understanding of the URI reference machinery of the Web is in sync with
the rest of the Web communities, since RDF-based systems are going to be
used for encoding and reasoning about URI-based data from a variety of
other W3C XML languages (SVG, MathML, OWL, X-Link etc etc.).

Dan


ps. a summary from IRC chat with JanG (don't blame him for any goofs in my
presentation of this though; we only manage to agree every other day)

<jang> the "meaning" of a piece of RDF is the conjunction of the meaning  of individual triples
<jang> so if a piece of RDF contains just _one_ triple that is  "meaningless", it renders the whole thing meaningless.
<jang> by using a de re quoting mechanism, the reification of such a piece  of RDF is meaningless
<jang> and so is any RDF that contains it
<jang> we avoid this by ensuring that we have a default "meaning" for any  piece of RDF


quick reference appendix: MT excerpts regarding URI references

The MT spec (valentine days edition) tells us:

	There are several aspects of meaning in RDF which are ignored by this
	semantics. It treats URIs as simple names, ignoring aspects of meaning
	encoded in particular URI forms [RFC 2396]. It does not provide any
	analysis of time-varying data or of changes to URI denotations.

Noting that "Some of these may be covered by future extensions of the
model theory."

I've excerpted here the main things the MT says about URI references:

	http://www.w3.org/TR/2002/WD-rdf-mt-20020214/

	 A uriref is defined to be a URI reference in the sense of [RFC 2396].We
	do not distinguish between urirefs and uriref nodes because urirefs are
	considered to be nodes in themselves.

	...urirefs are considered to have a 'global' meaning

	RDF uses two kinds of referring expression, urirefs and literals. We make
	very simple and basic assumptions about these. Urirefs are treated as
	logical constants, i.e. as names which denote resources; no assumptions
	are made about the nature of resources.

	An interpretation assigns meanings to symbols in a particular vocabulary
	of urirefs.


	....the model theory simply assumes that such lexical issues have been
	resolved in some way that is globally coherent, so that a single uriref
	can be taken to have the same meaning wherever it occurs.

	 Similarly, the model theory given here has no special provision for
	tracking temporal changes; it assumes, implicitly, that urirefs have the
	same meaning whenever they occur.





	Asserting an RDF graph amounts to claiming that it is true, which is
	another way of saying that the world it describes is, in fact, so arranged
	as to be an interpretation which makes it true. In other words, asserting
	a piece of RDF amounts to asserting a constraint on the possible ways the
	world might be. Notice that there is no presumption here that any RDF
	graph contains enough information to specify a single unique
	interpretation. It is very difficult, and usually impossible, to assert
	enough in any language to completely constrain the interpretations to a
	single possible world, so there is no such thing as 'the' unique RDF
	interpretation. In general, increasing the size of a graph decreases the
	set of interpretations that an assertion of the graph allows to be true.

	The use of 'public' URIs in an RDF graph is often taken to imply that an
	assertion of the graph implicitly assents to the truth of other RDF graphs
	that define the meaning of that URI. To apply the model theory to this
	kind of situation, one should think of the restriction on the world
	represented by an assertion of the merge of the asserted graph together
	with whatever RDF graphs are assumed to define the public vocabulary, in
	order to fully convey the intended meaning of the 'public' assertion.


	This only applies to uses of RDF that are intended to be the assertion of
	propositional content. A fully adequate account of what it means to make
	an assertion in a Web context is a research problem that is beyond the
	scope of this document.
Received on Friday, 22 March 2002 09:18:01 UTC