Re: AW: {Disarmed} Re: blank nodes (once again) from Sandro Hawke on 2011-03-24 (semantic-web@w3.org from March 2011)

From: Sandro Hawke <sandro@w3.org>
Date: Thu, 24 Mar 2011 11:13:32 -0400
To: Pat Hayes <phayes@ihmc.us>
Cc: Michael Schneider <schneid@fzi.de>, Graham Klyne <GK-lists@ninebynine.org>, Dieter Fensel <dieter.fensel@sti2.at>, Enrico Franconi <franconi@inf.unibz.it>, Hugh Glaser <hg@ecs.soton.ac.uk>, Mark Wallace <mwallace@modusoperandi.com>, Alan Ruttenberg <alanruttenberg@gmail.com>, Reto Bachmann-Gmuer <reto.bachmann@trialox.org>, Ivan Shmakov <oneingray@gmail.com>, Ivan Shmakov <ivan@main.uusia.org>, "<semantic-web@w3.org>" <semantic-web@w3.org>
Message-ID: <1300979612.3138.2821.camel@waldron>
On Thu, 2011-03-24 at 09:45 -0500, Pat Hayes wrote:
> Michael, greetings.
> 
> Of course you are right. Which is why it would probably not be useful or practical to *change* the interpretation of blank nodes in RDF. On the other hand, it might be useful to define a simplified version of RDF which simply does not have blank nodes in it. They really are of very little practical use. 

I don't know of real data about this, and I may not be representative,
but I know when I write RDF by hand, and when I write software which
constructs RDF, I often find it easier to use blank nodes than to think
about what to name every item referred to in my content.

As I pointed out earlier, I think blank nodes are a convenience for the
speaker and an inconvenience for the (machine) listener.   Since they're
also a convenience for the listener when the listener is human, and
right now so much RDF isn't really being used by machines, I think the
sense of them as an overall convenience has persisted.


> Regrettable as it may be, there is now a large (and growing) community of RDF users who really do not care very much about OWL or RIF, certainly do not care a jot for the distinctions between the various species of OWL, use SPARQL only as an RDF version of SQL, and have absolutely no use for blank nodes and strongly advise their peers to avoid using them. The patterns of reasoning exemplified by blank node scoping are of no interest to them whatsoever. If anything, existential generalization is a nuisance, rather than a useful inference. They would be very happy with RDF engines which flag blank nodes as errors or (better) automatically skolemize them. 

I think there's a whole lot to be said for automatically Skolemizing
them.   To do it well requires some work, but I think it's feasible for
many kinds of deployment.

In particular, I think the system which first exposes the RDF content on
the Web should be the one which Skolemizes it, since it knows what URL
prefix to use.   (If there isn't one such system, then Skolemizing is a
problem.)   This system has the interesting challenge of minimizing
changes if/when it re-reads modified content destined for the same URL.
That's the most interesting problem in this space, to me....

To rephrase that problem: given similar RDF graphs G1 and G2, and a
labeling of the blank nodes in G1 to produce G1', how do you produce a
labeling of the blank nodes in G2, G2', such that the differences
between G1' and G2' are as small as the differences between G1 and G2?

In practice, imagine I have a hand authored page of turtle with maybe
150 triples, much of it lists.  I click "publish" and it gets Skolemized
and published at URL U.  Then I change my mind about something, make a
tiny edit, click "re-publish" and it gets Skolemized again, and the new
version gets published at U.   If someone is watching U, I want them to
see that only a little change was made.  A naive (uuid) Skolemization
would make the change look huge, as every blank node got an entirely new
label.   

> The one possible exception I can see is the use of bnodes to encode OWL syntax, using the RDF list construction. Clearly, one does not want to have OWL/RDF entailments ruined because a list has been given a name. This might require some special conventions; but in practice, again, this use of RDF has never been seriously intended to be used by RDF inference engines. Rather, this 'encoding' of OWL uses RDF as a serialization mechanism to move OWL around the Web via RDF portals. If we were to make this explicit, we could isolate this from RDF entailment regimes altogether. Which now that I think about it, might be a very good idea. 

Is there something in the OWL specs that says OWL doesn't work (or that
we're no longer in DL) if the nodes composing the lists are not blank?
That would be a problem.

Is the isolation you're talking about any different from "dark triples"?

   -- Sandro


> Pat
> 
> PS. other comments added in-line below.
> 
> 
> On Mar 24, 2011, at 8:59 AM, Michael Schneider wrote:
> 
> > Hi all!
> > 
> > Consider this: If you treat blank nodes in the way currently specified in the RDF spec, that is, as *locally scoped* to their containing graph, then it makes a clear difference whether their semantics is that of existential variables or that of (skolem) constants when it comes logical conclusion and, hence, for reasoning. 
> > 
> > For example, given the following two graphs:
> > 
> >   G1 = { 
> >       ex:s1 ex:p1 ex:o1 . 
> >       ex:s2 ex:p2 _:x . 
> >    }
> > 
> >   G2 = { 
> >       ex:s2 ex:p2 _:x . 
> >    }
> > 
> > Under current RDF simple entailment with existential semantics and local scope for blank nodes, G1 obviously entails G2. But if you modify RDF simple entailment to interpret blank nodes as /constants/, while still keeping them /local/ to their graphs, then this becomes a /non/-entailment.
> 
> But nobody has suggested that particular combination.
> 
> > The reason is that, on the one hand, now being constants, both occurrences of the name "_:x" in the two graphs denote some individual in the universe of discourse each but, on the other hand, since the "_:x" constants are local to their respective graph, there are interpretations under which they denote /different/ individuals. Just as different names within the same graph may denote different individuals. In fact, if you would merge G1 and G2, the blank nodes would need to be renamed (that's essentially what locality of blank nodes is all about!), leading to (modulo blank node identifiers):
> > 
> >    G12 = { 
> >        ex:s1 ex:p1 ex:o1 . 
> >        ex:s2 ex:p2 _:y . 
> >        ex:s2 ex:p2 _:z . 
> >    }
> > 
> > For comparison, under (current) existential semantics of blank nodes in RDF simple entailment, the merged graph G12 semantically implies both G1 and G2. In fact, G12 is even semantically equivalent to G1, i.e., G12 contains redundant triples. However, this would not be the case anymore when blank nodes are seen as /local constants/. In this case, G12 would be free of redundancy (all constants "ex:o1", "_:y" and "_:z" can be interpreted pairwise differently), and from this it becomes clear that G1 cannot imply G12. Further (and probably more surprisingly), not even does G1 nor G2 semantically follow from G12, although G12 has been created by merging the two original graphs . The reason is basically the same as for why G2 does not follow from G1 (although there is no sharing of blank node identifiers between G12 and G1 as it has been between G1 and G2, but this doesn't make a difference under local scope assumption). 
> > 
> > So, under local constant view, the three graphs are semantically largely unrelated, while under local existential view, they are largely related. I'd call this a sensible difference!
> > 
> > Not only for reasoners would such a change from an existential to a constant view have considerable consequences (a reasoner that innocently infers G2 from G1 would be unsound ("broken") w.r.t. the changed RDF semantics, which would probably hit most if not all existing RDF(S) reasoners). Also SPARQL would be affected, including SPARQL 1.1. For example, the current Working Draft of SPARQL 1.1 has a nice example on blank nodes in query results:
> > 
> >    <http://www.w3.org/TR/2010/WD-sparql11-query-20101014/#BlankNodesInResults>
> > 
> > The given result sets and the discussion in the cited section are only really justified under the assumption of blank nodes having /local scope/ and /existential/ semantics. If one would switch to /globally/ scoped /constants/ (as it is the case for URIs), then the querying result should consist of the original blank node names "_:a" and "_:b" and no others, which clearly conflicts with the result sets and the discussion in the cited section.
> 
> Indeed. But look how much effort is expended to explain carefully how local bnode identifiers don't act like global names, and now add in the amount of confusion and implementation difficulty this causes. Would it not be better if this simply were eliminated? That query would still *work* if the RDF used anonymous URIs. Any patterns of node identity in the queried data would still be visible in the query results (which is really all that matters here). Everything would work perfectly, in fact, without needing this explanation. So yes, the SPARQL documents would need a little editing, but this would consist chiefly of deleting unnecessary material. 
> 
> > And if one would switch to /locally/ scoped /constants/, then the result set should be empty, following the explanation I gave above - again much different from the cited section.
> > 
> > So, if the RDF WG intents to make any changes to the RDF spec concerning the syntactic and semantic properties of blank nodes, then it should also consider hinting the SPARQL WG, so that they can update their current working drafts accordingly. Of course, this would require a major update that breaks backwards-compatibility with SPARQL 1.0. It would also have a strong effect on the new SPARQL 1.1 entailment regimes (http://www.w3.org/TR/sparql11-entailment/), at least for the entailment regimes based on RDF, RDFS and the OWL 2 RDF-Based Semantics, since these are all defined with respect to the original model theories defined in the current RDF Semantics spec, or with respect to the OWL 2 RDF-Based Semantics spec (http://www.w3.org/TR/owl2-rdf-based-semantics/), which itself is based on the current RDF Semantics spec, i.e., they all depend on existential blank node semantics. 
> > 
> > And, as we mention OWL 2, this standard should then perhaps also be revised (or at least its future successors should be changed in a backwards-incompatible way in order to conform to the changed RDF spec) . This would have particularly strong consequences for OWL 2 Full (which uses the mentioned OWL 2 RDF-Based Semantics as its semantics), as this language is fully based on the current RDF Semantics spec and additionally includes definitions that heavily assume that blank nodes are seen as existentially quantified variables. This was even stronger the case for OWL 1 Full, but still is the case for OWL 2 Full (ask, if you are interested in further explanation). But also OWL 2 DL, even though it's semantics is /not/ based on the RDF Semantics, still has a notion of "anonymous individuals", which are represented by blank nodes in the RDF mapping of OWL 2, and which happen to be interpreted as existentially quantified variables as well. So, should OWL 2 DL also be changed, or would a further drifting apart of RDF and OWL DL be ok for everybody?
> > 
> > And let's also not forget RIF, at least the specification of RIF-RDF combinations in <http://www.w3.org/TR/2010/REC-rif-rdf-owl-20100622/>. The definition of "satisfaction" of a RIF-RDF combination by a "common-RIF-RDF interpretation"  reuses, for the RDF part, the specification of "RDF satisfaction" as provided by the current RDF semantics specification - that is, it makes use of the existential semantics for blank nodes occurring in the RDF graphs in a RIF-RDF combination.
> > 
> > And also let's not forget about all the books and papers that have been written on the topic, software that has been created, projects, conferences, companies... 
> > 
> > It appears to me that a little change in the semantics of blank nodes would go a long way... :->
> > 
> > Cheers,
> > Michael
> > 
> > ________________________________________
> > Von: semantic-web-request@w3.org [semantic-web-request@w3.org]&quot; im Auftrag von &quot;Graham Klyne [GK-lists@ninebynine.org]
> > Gesendet: Donnerstag, 24. März 2011 10:08
> > Bis: Dieter Fensel
> > Cc: Enrico Franconi; Pat Hayes; Hugh Glaser; Mark Wallace; Alan Ruttenberg; Reto Bachmann-Gmuer; Ivan Shmakov; Ivan Shmakov; <semantic-web@w3.org>
> > Betreff: Re: {Disarmed} Re: blank nodes (once again)
> > 
> > FWIW, my recollection of the working group discussions followed a similar path:
> > that bNodes don't fundamentally add expressive power when making assertions
> > about the world.  I.e. that Skolemization achieves the same effect.  I think it
> > was mainly the convenience (maybe not for logicians!) argument that carried the day.
> > 
> > But I do recall some discussion also about the use of RDF expressions as
> > patterns, a kind of query, in which their logical interpretation might vary.  If
> > that viewpoint once had any merit, I suspect it has been rather overtaken by the
> > subsequent standardization of SPARQL.
> > 
> > I know that I find bNodes convenient when constructing RDF, but also I have
> > found them problematic when implementing inference machinery (by reason of
> > unclear intermediate scope boundaries).  One implemenation strategy I'd probably
> > use in future is to replace all bNodes internally by some form of unique
> > identifier (maybe a UUID URI), then map back to bNode when serializing a graph.
> > 
> > So, yes, it is then just a syntactic convenience.  But not one I'd necessarily
> > choose to forego.
> > 
> > #g
> > --
> > 
> > Dieter Fensel wrote:
> >> Dear all,
> >> 
> >> I am not sure it is useful to add another comment and I also
> >> only partially understand the contents of the flow of emails
> >> on this issue. However, I will try it and risking to look like a fool.
> >> 
> >> 1) bnodes are a trick to avoid thinking about useful names
> >> in situations you do not really care about them
> >> and used f.e. in implementing lists in RDF. Obviously
> >> they were not really needed but make life easier.
> >> 
> >> 2) Logicans entered the place and started to interpret them as
> >> existential quantified variables. This is not wrong (since they
> >> are statements about something that exists and has a certain
> >> property), however, it is a somehow heavy way to interpret a
> >> simple syntactical short-cut.
> >> 
> >> I do not think that RDF wants to forbid to interpret them as names,
> >> only one does not care about the specific one. Maybe a straight-forward
> >> way is to think about them as unique constants, i.e., use the idea
> >> of skolemization. I think this is also in line with a proposal of Pat,
> >> a down-sized version of the Jos & Enrico paper, and in sync with
> >> [1].
> >> 
> >> Alternatively one may simply recommend to not using them (or to
> >> read these thousand emails before using them).
> >> 
> >> Obviously, I may have missed the point, I may violate the charter, and I
> >> should read 1000 emails more carefully.  Btw, I do not think that the
> >> discussion is not interesting but obviously indicates a problem.
> >> 
> >> [1] G. Yang and M. Kifer: Reasoning about Anonymous Resources
> >> and Meta Statements on the Semantic Web, J. Data Semantics, 2003: 69~97.
> >> 
> >> 
> >> 
> >> At 21:33 20.03.2011, Enrico Franconi wrote:
> >> 
> >>> On 18 Mar 2011, at 22:14, Pat Hayes wrote:
> >>> 
> >>>> As a fallback, I am thinking of writing up a spec-like document
> >>> defining 'ground RDF', to show how much simpler everything is when you
> >>> don't have them. It would cover RDF, RDFS, OWL and SPARQL. What do you
> >>> think?
> >>> 
> >>> In [1] we have formally explored this case.
> >>> --e.
> >>> 
> >>> [1] Jos de Bruijn, Enrico Franconi, Sergio Tessaris (2005). Logical
> >>> Reconstruction of normative RDF. Proc. of the Workshosp on OWL
> >>> Experiences and Directions (OWLED 2005), Galway, Ireland, November
> >>> 2005. <http://www.inf.unibz.it/~franconi/papers/owled-05.pdf>
> >> 
> > 
> > --
> > Dipl.-Inform. Michael Schneider
> > Research Scientist, Information Process Engineering (IPE)
> > Tel  : +49-721-9654-726
> > Fax  : +49-721-9654-727
> > Email: michael.schneider@fzi.de
> > WWW  : http://www.fzi.de/michael.schneider
> > ==============================================================================
> > FZI Forschungszentrum Informatik an der Universität Karlsruhe
> > Haid-und-Neu-Str. 10-14, D-76131 Karlsruhe
> > Tel.: +49-721-9654-0, Fax: +49-721-9654-959
> > Stiftung des bürgerlichen Rechts
> > Stiftung Az: 14-0563.1 Regierungspräsidium Karlsruhe
> > Vorstand: Dipl. Wi.-Ing. Michael Flor, Prof. Dr. rer. nat. Ralf Reussner,
> > Prof. Dr. rer. nat. Dr. h.c. Wolffried Stucky, Prof. Dr. rer. nat. Rudi Studer
> > Vorsitzender des Kuratoriums: Ministerialdirigent Günther Leßnerkraus
> > ==============================================================================
> > 
> 
> ------------------------------------------------------------
> IHMC                                     (850)434 8903 or (650)494 3973   
> 40 South Alcaniz St.           (850)202 4416   office
> Pensacola                            (850)202 4440   fax
> FL 32502                              (850)291 0667   mobile
> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
> 
> 
> 
> 
> 
> 
>
Received on Thursday, 24 March 2011 15:14:01 UTC