- From: Reto Bachmann-Gmür <rbg@talis.com>
- Date: Tue, 02 Oct 2007 13:29:44 +0200
- To: Bijan Parsia <bparsia@cs.man.ac.uk>
- CC: public-owl-dev@w3.org
Bijan Parsia wrote: > On 2 Oct 2007, at 09:30, Reto Bachmann-Gmür wrote: > [...] > > "Violating" is a bit strong. "Ignoring" is better. It's just that most > of the time ignoring is "compatible", in some sense, but the proof > comes out when you put things to the test. In RDF, given the weakness > of the language, the tests are easier to avoid. But, for example, no > owl engine treats bnodes the same as somevalues statements. As I already wrote, it's perfectly ok for an application to keep graph unlean. In fact an API to draw graphs (like jena) may need to keep redundant statements because the api user may still be adding properties to those nodes. > >> do you have any stats or at least examples? Note that a tool doesn't >> need to enforce lean-graphs all of the time, > > True. But how many leaners are out there? Production quality? How many > entailment testers are out there? Production quality. How many > applications make use of either. How many treat triples from documents > as stable and sacrocent? (I mean, not eliminating the *easy* cases?) I guess that many applications don't need to care because they are not aggregating b-nodes from external sources, or because they deal with b-nodes that are grounded by functional or inverse functional properties so they can smush them without considering the whole extension. Other applications do in fact some partial leanification because users do complain when the rss:items in their aggregation get (or appear to get) a new maker whenever that feed is aggregated. At least a partial leanification is done by tools using graph-decomposition (such as MSGs[1] or RDF Molecules[2]). > > Usually it takes me anywhere from 15 minutes to a couple of hours to > explain the existential semantics plus it's implications in a variety > of contexts. And I'm always having to explain it ;) We all deal with existential semantics in daily life and natural language. Sara says: "there's a cat in the garden" Peter says: "there's a cat in the garden" If after hearing and trusting Sara and Peter someone ask us what there is in the garden we would usually not say "there's a cat and a cat in the garden" but we would usually never have created a non-lean graph in our minds or a more complex situation Sara says: "there's a cat in the garden" Peter says: "there's a green cat in the garden" If after trusting those two assertions most people would summarize what there is in the garden as "a green cat" and this are mostly people which never (explicitly) got existential semantics explained. >> but for a tool complaint with rdf-semanrics the following graph >> >> eg:joesText foaf:maker [ foaf:name "jo"]. >> >> expresses the same content as >> >> eg:joesText foaf:maker [ foaf:name "jo"]. >> eg:joesText foaf:maker [ foaf:name "jo"]. > > So, how do you test that? In the example the two graphs contain an identical set of MSGs /rdf molecules, but it's true that you can have graphs which decompose into non-lean components or components which are obsoleted by the union of other components. > In my experience, if a store *doesn't* treat these as distinct, you > get bug complaints. and at least an RFE when *do* you treat them as distinct. It solely depends if your users want a store for semantic content or a store for non-lean graphs, in most cases they would be happy with both, some using the semantic content store may use it wrong and have unexpected results (and post your bug report), some users of the store for non-lean graphs will eventually get tired of removing redundancies using ontology specific knowledge and heuristics and move to a semantic content store. [...] > >> Or for an aggregator: >> whenever we aggregate the first graph we add two triples to our >> aggregated graph, and if I got your "sane" interpretation right >> eg:joesText has a new maker, which without further knowledge is not >> considered to ow:sameAs to any of the exitisting foaf:makerS. > > Note your language leans heavily toward the local name interpretation. Could it be that you missed that after 'if I got your "sane" interpretation right' I tried to paraphrase your position? > The bnode is *not an entity* but a variable. agreed. > I could understand if stored generally didn't lean for efficiency > reasons, but they lean graphs of syntactically identicial triples. > s p o. > s p _:x. Not leanifying for efficiency reasons is ok, doing some leanification (where this can be done cheaply) is ok as well but compleate leanification is best when it comes to having the most compact expression of knowledge and thus the most valuable triples. Or to go back to the cat(s): after being told "there's a cat in the garden" and "Fritz is in the garden, Fritz is a cat" a cheaper edition of our house-robot would summarize the situation as "Fritz and a cat are in the garden, Fritz is a cat" while the slightly more expensive edition of our RDF based robot would say "The cat Fritz is in the garden". [cutting sparql-part. Summary: sparql works on lean and on-lean graphs ] > Plus, people just don't think of bnodes as ever redundant. I can't > find the email right now, but a DAWG member suggested that a pattern > like: > > _:x rdf:type Invoice; hasItem 4. > _:y rdf:type Invoice; hasItem 3. > > Would safely indicate that you had (only) two invoices. Note that this > *is* lean. if hasItem was a functional property the you could safely conclude that there are at least two invoices. >> More generally, what's your motivation to change the semantic of >> b-nodes, > Because they cause lots of problems and their semantics offer no > gains. For example, with existential Bnodes sparql query answering for > RDFS is *NP-Complete* in *DATA COMPLEXITY*. subgraph isomorphism is in NP even without existential variables in the graph. > That should be a scarey fact for anyone interested in scalability. Real-world graphs decompose into small components (especially when taking functional and inverse function properties into account as in RDF Molecules) so many of the operations which would be very expensive on huge graphs consisting essentially of bnodes become quite cheap. > How is having true existential semantic worth that? It is. Because the tension is between b-nodes and centralized ontological naming authorities. To have interoperable descriptions of the same universe we would need a central authority assigning names. It does make computation easier - at least for what the tax-office needs to compute, that's probably why most designed social systems are centralized. > >> if you don't like/need existential variables why don't you just >> assign URIs (urn:uuid) to you nodes? > > I do. But lots of people do not and I have to deal with their data. > Plus when I write specs and software, if I have to treat: > s p _:x. > > as equivalent to: > s:[...someValuesFrom...] > > then many things become much, much, much harder. Interoperability > becomes harder. Decentralized interoperability only becomes possible with your great, great, great effort. But its worth it :) Cheers, reto 1. http://www.dbin.org/RDFContextTools.php 2. www.*ksl.stanford.edu*/people/pp/papers/Ding_ISWC_2005.pdf -- Reto Bachmann-Gmür Talis Information Limited Book your free place now at Talis Insight 2007 www.talis.com/insight Find out more about Talis at www.talis.com Shared InovationTM Any views or personal opinions expressed within this email may not be those of Talis Information Ltd.
Received on Tuesday, 2 October 2007 11:29:54 UTC