- From: Bijan Parsia <bparsia@cs.man.ac.uk>
- Date: Tue, 2 Oct 2007 10:21:30 +0100
- To: Reto Bachmann-Gmür <rbg@talis.com>
- Cc: public-owl-dev@w3.org
On 2 Oct 2007, at 09:30, Reto Bachmann-Gmür wrote: > Hi Bijan, > > You wrote: > >> However, the common, deployed semantics for BNodes is that they are >> local names, not existential variables. SPARQL treats them that way. > and >> If so, then you are pro-existential variable semantics. If not, then >> you are sane, er, in favor of a somewhat different approach like most >> of the world. > I'm wondering why you think most of the world is violating rdf- > semantics, Because it's true! The *best* reason to believe something. "Violating" is a bit strong. "Ignoring" is better. It's just that most of the time ignoring is "compatible", in some sense, but the proof comes out when you put things to the test. In RDF, given the weakness of the language, the tests are easier to avoid. But, for example, no owl engine treats bnodes the same as somevalues statements. > do you have any stats or at least examples? Note that a tool > doesn't need to enforce lean-graphs all of the time, True. But how many leaners are out there? Production quality? How many entailment testers are out there? Production quality. How many applications make use of either. How many treat triples from documents as stable and sacrocent? (I mean, not eliminating the *easy* cases?) Usually it takes me anywhere from 15 minutes to a couple of hours to explain the existential semantics plus it's implications in a variety of contexts. And I'm always having to explain it ;) > but for a tool complaint with rdf-semanrics the following graph > > eg:joesText foaf:maker [ foaf:name "jo"]. > > expresses the same content as > > eg:joesText foaf:maker [ foaf:name "jo"]. > eg:joesText foaf:maker [ foaf:name "jo"]. So, how do you test that? In my experience, if a store *doesn't* treat these as distinct, you get bug complaints. > Not treating b-nodes as existential varaiables would mean that the > union of a graph with itself would be a different graph. Well, it's clearly *syntactically* a different graph, in a way that: s p o and s p o. s p o. are not. > Or for an aggregator: > whenever we aggregate the first graph we add two triples to our > aggregated graph, and if I got your "sane" interpretation right > eg:joesText has a new maker, which without further knowledge is not > considered to ow:sameAs to any of the exitisting foaf:makerS. Note your language leans heavily toward the local name interpretation. The bnode is *not an entity* but a variable. I could understand if stored generally didn't lean for efficiency reasons, but they lean graphs of syntactically identicial triples. s p o. s p _:x. (where as databases may not, for the sake of efficiency). SPARQL's "distinct" mechanism *does not lean answers*: http://www.w3.org/TR/rdf-sparql-query/#modDistinct http://www.w3.org/TR/rdf-sparql-query/#BGPsparqlBNodes > I was wondering why you think SPARQL treats b-nodes as local > names, afaik sparql doesn't guarantee that a query against two > graphs expressing the same content yields to the same result but it > doesn't require an implementation to keep redundant b-nodes neither. "SPARQL uses the subgraph match criterion to determine the solutions of a basic graph pattern. There is one solution for each distinct pattern instance mapping from the basic graph pattern to a subset of the active graph." That's the requirement to keep redundant bnodes. You, of course, in a separate operation, lean your graph. But note that this doesn't solve all issues. "This is optimized for ease of computation rather than redundancy elimination. It allows query results to contain redundancies even when the active graph of the dataset is lean, and it allows logically equivalent datasets to yield query results." I think the last should be "to yield different query results". If you are existentially inclined, then a results set can contain bnode redundancies *even if* the query graph is lean. It's trivial, eh? _:x p o. _:y p z. select ?a where {?a p ?y} Result set: ?a _:x _:y Which gives you no more information than: _:x If you use construct, you'll get a non lean graph. (See <http://www.cs.man.ac.uk/~bparsia/2006/row-tutorial/#slide32> Note there seems to be a bug on slide 37; parts are just garbled; i'll try to clean it up later today.) Plus, people just don't think of bnodes as ever redundant. I can't find the email right now, but a DAWG member suggested that a pattern like: _:x rdf:type Invoice; hasItem 4. _:y rdf:type Invoice; hasItem 3. Would safely indicate that you had (only) two invoices. Note that this *is* lean. > More generally, what's your motivation to change the semantic of b- > nodes, Because they cause lots of problems and their semantics offer no gains. For example, with existential Bnodes sparql query answering for RDFS is *NP-Complete* in *DATA COMPLEXITY*. That should be a scarey fact for anyone interested in scalability. How is having true existential semantic worth that? > if you don't like/need existential variables why don't you just > assign URIs (urn:uuid) to you nodes? I do. But lots of people do not and I have to deal with their data. Plus when I write specs and software, if I have to treat: s p _:x. as equivalent to: s:[...someValuesFrom...] then many things become much, much, much harder. Interoperability becomes harder. Now there are many ways to compromise. The sparql spec works hard and carefully to keep thing at least *seeming* compatible. But really, it's a waste of effort and confusing. > What's the particularity of "local" names, do they have to change > when they change context? what's the advantage of it? I have a proposal in mind that preserves just about all used behavior and makes life much easier, but I've not written it out. Leaning goes out the window. Yay. Keep it, if you like, as an unsanctioned operation along side other graph munging. Parser behavior stays the same, though we have a "merge" vs. "union" option, since some of the time you want to keep nodeids in different graphs distinct and sometimes you don't. merge behavior (distinct) is default, though we could work out a proposal that allowed for roundtripping nodeids. Cheers, Bijan.
Received on Tuesday, 2 October 2007 09:20:12 UTC