5.5 Dereferenceable URI refers to chimera entity

Current recommended practice is to use a dereferenceable hashless URI 'http://example/p16' to refer to the information resource at that URI, IR('http://example/p16') (see 7.3 Using a URI to refer to the information resource accessible via that URI). To use an http: scheme 'slash URI' to refer to anything else, recommended practice is to use a 303 redirect. However, to address performance and deployment difficulties with 303, it has been suggested that a definition of a URI could be published and discovered at that URI directly - that is, that the URI dereferences directly (with a 200 status code) to a document containing its own definition, and its meaning should be obtained from that definition instead of from the httpRange-14 rule regarding information resources. [TODO: Say somewhere what the httpRange-14 rule is] In this section we ask whether this approach can work without disrupting the way metadata is written and used.

Suppose that Alice wants to use the URI 'http://example/p16' to refer to a canoe. She publishes a definition containing the following at 'http://example/p16':

# Graph gd:
<http://example/p16> a foo:Canoe .
<http://example/p16> foo:mass 2140 .
<http://example/p16> foaf:name "Assabet Angler" .

Bob then comes along and dereferences http://example/p16, obtaining Alice's graph gd and an HTTP 200 status code.  Because of the 200 status code, Bob applies the httpRange-14 rule and concludes the following:

# Graph gh:
<http://example/p16> :accessibleVia "http://example/p16" .

Bob then publishes the following metadata about IR('http://example/p16'):

# Graph gb:
<http://example/p16> dc:creator "Alice" .
<http://example/p16> dc:title "All about the Assabet Angler" .

5.5.1 Carol

Carol wishes to use the merge of graphs gd and gb to print information about canoes and the web pages that describe them, such as the following:

Canoe name: Assabet Angler
Canoe mass: 2140
Location of canoe description: http://example/p16
Title of canoe description: "All about the Assabet Angler"
Author of canoe description: Alice

To generate this information from her data, Carol's application uses the following SPARQL:

# Query qc:
SELECT ?name, ?mass, ?uri, ?title, ?author
WHERE {
?c a foo:Canoe .
?c foaf:name ?name .
?c foo:mass ?mass .
?c rdfs:isDefinedBy ?ir .
?ir :accessibleVia ?uri .
?ir dc:title ?title .
?ir dc:creator ?author . }

Furthermore, Carol has the following implicit rules ri built in to her application (or world view):

# Implicit rules ri:
1. For any resource r, if r has a dc:creator property
then r is a massless entity.

2. The set of massless entities is disjoint with the set of foo:Canoes.

Carol merges RDF graphs gd and gb, and applies standard RDF and OWL semantics.  No inconsistency is detected.  Carol then applies her implicit rules ri and discovers a contradiction: <http://example/p16> is both a massless entity and a foo:Canoe, but those two classes are supposed to be disjoint. 

Carol's problem is caused by the combination of graphs gd, gb and implicit rules ri.  If any one of these three components were eliminated, there would be no contradiction.  However, since Carol's implicit rules ri are built in to her application (or world view) she is unable to change or eliminate that component, and instead views the situation as a problem of contaminated data.  What should Carol do?

Option 1: Merge gd and gb, then selectively discard assertions until the contradiction is avoided.  This approach is not likely to be very satisfactory in the general case, for a few reasons: (a) it may not be obvious which assertions treat <http://example/p16> as a massless entity and which treat it as a foo:Canoe, as several steps of inference may be involved; (b) some properties might apply to both massless entities and foo:Canoes, so it may not be easy (or even possible) to know which of such properties should be retained; (c) assertions that are discarded to avoid contradiction may be some of the very assertions that the applications needs; and (c) this option is likely to require significant manual effort.

Option 2 Split the identity of <http://example/p16> before the graphs are merged. This means changing the assertions in at least one of the graphs to use a different URI or bnode instead of http://example/p16, so that the assertions in the two graphs will not be about the same resource when the graphs are merged.  This is described in http://dbooth.org/2010/ambiguity/paper.html#splitting .  The initial split can be readily automated, as it merely involves the proper substitution of one URI for another in the graph.  (Side note: normally this operation would be performed on a graph representing the ontological closure of the original graph, however in this example, we are assuming that the original graph is its own ontological closure: no assertions beyond those that are explicitly stated will be assumed.) 

Carol somehow suspects -- perhaps because of the contradiction that she incurred -- that the RDF definition of <http://example/p16> was improperly published directly at http://example/p16 instead of being published via a 303 redirect.  She dereferences http://example/p16, sees that the content returned is graph gd and the HTTP status code is 200, and concludes that her suspicion was correct.  She then notes that Bob's graph gb is all about the web-accessible thing that has URI http://example/p16.  She therefore decides that option 2 -- splitting -- will be the easiest solution to her problem.

Carol mints a new URI u2 to denote the web-accessible thing that Bob's graph discusses.  She then properly substitutes <u2> for every occurrence of <http://example/p16> in gb to produce graph gb2:

# Graph gb2:
<u2> dc:creator "Alice" .
<u2> dc:title "All about the Assabet Angler" .

Carol's query qc relies on finding an rdfs:isDefinedBy assertion that enables information about a canoe to be properly linked with information about a web page describing that canoe.  Normally Carol adds this assertion to her graph by attempting the follow-your-nose principle [add explanation or reference] .  Thus normally, if by dereferencing http://example/p16 Carol obtains a 303 redirect to a new URI u, and when she dereferences u she obtains a 200 status code with graph gd, then Carol would add the following assertions to her merged graph:

# Graph gr:
<u> :accessibleVia "u" .
<http://example/p16> rdfs:isDefinedBy <u> .

However, in this case Carol obtained a 200 status code upon dereferencing http://example/p16.  Therefore, to join the data in graphs gd and gb2, Carol writes the following additional RDF assertions to state that the canoe denoted by <http://example/p16> is defined by the web-accessible thing that is now called <u2> in gb2:

# Graph gj:
<u2> :accessibleVia "http://example/p16" .
<http://example/p16> rdfs:isDefinedBy <u2> .

Carol now merges graphs gd, gb2 and gj to produce the following graph:

# Graph gj:
<u2> :accessibleVia "http://example/p16" .
<http://example/p16> rdfs:isDefinedBy <u2> .
<u2> dc:creator "Alice" .
<u2> dc:title "All about the Assabet Angler" .
<http://example/p16> a foo:Canoe .
<http://example/p16> foo:mass 2140 .
<http://example/p16> foaf:name "Assabet Angler" .

Carol is pleased that her query qc and her application work as desired against graph gj.

5.5.2 Derek

Derek's application is identical to Carol's application except that instead of having implicit rules ri built in to the application, they are expressed explicitly as the following additional assertions, which are merged with his data:

# Graph re:
{ ?r dc:creator ?v } => { ?r a foo:MasslessEntity } .
foo:Canoe owl:disjointWith foo:MasslessEntity .

The RDF processor in Derek's application understands the OWL semantic extensions and N3 rules.

Derek's application has the exact same functionality and limitations as Carol's application -- in particular, he still needs to remove assertions or split the identity of <http://example/p16> for his application to work -- but it does offer an important maintenance convenience: Derek can easily change his rules re, whereas Carol's implicit rules ri were built in to her application (or world view).

5.5.3 Erin

Erin's application is also identical to Carol's application (or Derek's application) except that it has neither implicit rules ri nor explicit rules re.  In short, it has no disjointness assumption.

Erin's life is significantly easier than Carol's or Derek's, as Erin's application produces correct output given the merge of gd and gb without requiring the effort of either the manual removal of assertions or identity splitting.

On the other hand, Erin realizes that if she modifies her application to use assertions involving properties that could apply both to foo:Canoe or a web-accessible thing, such as ":zoe :likes <foo> .", then she will indeed need to take other measures such as splitting the identity of <http://example/p16> to avoid falsly attributing such an assertion to the wrong aspect of <http://example/p16>.

5.5.4 Frank

Frank's application also operates on the merge of graphs gd and gb, but has half of the functionality of Carol's.  It does not use implicit rules ri or explicit rules re.  It cares only about canoe information -- not canoe descriptions -- such as:

Canoe name: Assabet Angler
Canoe mass: 2140

It generates this output based on the following query:

# Query qf:
SELECT ?name, ?mass
WHERE {
?c a foo:Canoe .
?c foaf:name ?name .
?c foo:mass ?mass . }

Since Frank's application only cares about data pertaining to the foo:Canoe aspect of <http://example/p16>, his application is untroubled by the existence of extraneous assertions that apply to the web-accessible aspect of <http://example/p16>.

However, like Erin, Frank also realizes that if he were to modify his application to make use of assertions involving properties that could equally apply to a foo:Canoe or a web-accessible thing, then he would have to take other measures such as splitting the identity of <http://example/p16>.

5.5.5 Gail

Gail's application is the complement of Frank's application: it cares only about canoe descriptions -- not canoes -- such as:

Location of canoe description: http://example/p16
Title of canoe description: "All about the Assabet Angler"
Author of canoe description: Alice

It does not use implicit rules ri or explicit rules re, and like Frank's application, Gail's application works fine on the merge of gd and gb in spite of the fact that the data contains extraneous assertions about foo:Canoes.

Gail's application has the same maintenance caveats as Frank's or Erin's: if Gail were to modify her application to make use of assertions involving properties that could equally apply to a foo:Canoe or a web-accessible thing, then she would have to take other measures such as splitting the identity of <http://example/p16>.