- From: Bob MacGregor <bmacgregor@siderean.com>
- Date: Tue, 21 Dec 2004 08:02:34 -0800
- To: Pat Hayes <phayes@ihmc.us>
- CC: bgrosof@mit.edu, "Eric Prud'hommeaux" <eric@w3.org>, public-rdf-dawg-comments@w3.org
- Message-ID: <41C8491A.1060503@siderean.com>
This issue of UNSAID is really much bigger than SPARQL -- it affects a significant fraction of Web users. If my understanding is correct, all of the RuleML users will be taking the Deductive Database / SQL mindset, which assumes closed world semantics. The key part of the discussion involves whether or not a system can trust/assume that closed world semantics applies to some graph / RDF dataset. While its certainly the case that nothing in RDF sanctions that assumption, only a small amount of machinery is needed, namely; named graphs and a single predicate that can assert that a particular graph has closed-world semantics. Possibly, the predicate would be more specific, asserting that a particular property (and its subproperties) are closed within the named graph. On the other hand, until named graphs become officially blessed (instead of just something that everyone recognizes would be a major step forward) this solution might not be viable for SPARQL. On the other hand, our implementation of SPARQL includes UNSAID already -- no sense in waiting around for what will inevitably arrive sooner or later.. Cheers, Bob Pat Hayes wrote: >> On 2004-12-18 21:58:34 -0800, Pat Hayes wrote, at >> http://lists.w3.org/Archives/Public/public-rdf-dawg/2004OctDec/0534.html: >> >> > The message that started the thread >> > >> http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2004Nov/0016.html >> > has an example that illustrates the point in its use case 2, the >> > financial institution that must not send its prospectus to >> > customers in the US or Canada. For this institution to rely on an >> > UNSAID query to ensure this rule was obeyed would be very risky, >> > since in general the RDF content against which the query is being >> > evaluated is not known to be complete with regard to citizenship >> > information. It cannot be so known, except by special access to >> > off-web information, as there are currently no Web protocols for >> > communicating the fact that a source is complete in this way. >> >> Indeed. The same applies to the truthfulness of the information >> contained in the RDF graph, or to the trustworthyness of information >> about the graph's truthfulness that's transmitted inside the >> protocol. That's, obviously, not a reason to declare RDF and SPARQL >> "very risky", and to drop them. > > > There is a key difference, however. If an agency publishes some > RDF/OWL content which asserts that, say, Joe is an American citizen, > then the specs do indeed establish that they are asserting this, so > questions of trustworthiness and responsibility for published claims > can be brought into the area of rational discussion. This does not > apply to negation-as-failure. If I publish some RDF/OWL which > describes some facts about citizenship but which fails to mention that > Joe is an American citizen, the specs insist that I have/ not/ thereby > asserted that Joe is not a citizen. If you draw that conclusion,/ you/ > do so at your risk, and I, the publisher, cannot be held responsible > for any consequences of that inferential act by you. It would be a > dangerous (IMO) mistake for SPARQL to imply in its design that this > kind of (negation-by-failure) inference was intended or meant to be > supported by an RDF or OWL reasoner; it could (will) be used to > deflect responsibility for mistakes to the wrong agency. > >> The point of applying UNSAID in the way described in use case 2 is, >> precisely, that the graph that's queried is assumed to be > >> sufficiently complete for the querying party's purposes. > > > But that assumption is invisible on the semantic web. My point is that > there is no way for a software agent to be told that a graph is > 'sufficiently complete' in the required sense. (No way to transmit > that using http, if you like.) And recall that the intended goal of > the semantic web it to allow/ software agents/ to make rational > decisions. If a designer really wants to use this kind of reasoning on > a source that it knows to be complete, I believe it is quite easy to > do so without having UNSAID in the querying protocol. For example, the > application can explicitly query for the rejection case and reject the > instance if it finds the relevant triple; then it has performed an > invalid inference, but has done so by using valid protocols . My > quarrel is not with the reasoning strategy (though I have my doubts > about it) but with the incorporation of an invalid reasoning process > into the querying protocols. > > A related matter. UNSAID refers simply to the absence of a triple. But > RDF supports entailment of triples by other triples, and such > entailments become quite complex in RDFS and extremely complex in OWL; > and RDF/XML is required by the various W3C WG charters to be the > interchange syntax for these more complex languages. Suppose an > OWL/RDF or RDFS triple store does not contain a certain triple, but > that triple can be inferred by valid OWL or RDFS reasoning from > triples that it does contain. In this case, a reasoner that relied on > UNSAID to implement negation-by-failure would become logically > incoherent, not merely mistaken: quite simple inputs would cause it to > become enmeshed in contradictions. (It might be better to have > something like UNIMPLIED rather than UNSAID, particularly as an RDF > graph can be reasonably taken to be 'saying' any RDF-valid consequence > of itself. ) > >> The >> judgment whether or not this kind of assumption is "very risky" >> (whatever this means) is not the protocol designer's to make, but >> strictly a business decision made by the party that applies the > >> protocol. > > > The anticipated uses of SW technology require such decisions to be > made by software, not by designers of software. Right now there is no > way to transmit the necessary information to a piece of software. (I > wish there were: the lack of this ability is a notable failure of the > RDF/OWL effort, I now think, for which I must bear part of the > responsibility.) > >> In fact, the word "complete" is ambiguous here: While a graph may be >> incomplete, in the sense that it lacks facts that are out there >> (this is the notion of "incompleteness" that you apply to use case >> 2), > > > Lacks a particular kind of fact. I agree that the notion of > 'completeness' here is ambiguous; that is part of the technical problem. > >> the same graph may quite well be the querying party's complete >> knowledge of facts at some point of time. In this context, UNSAID >> also serves to help a party know what it does not know. > > > I agree that is a potentially useful thing to be able to query. > However, the very fact that your use cases relied on invalid reasoning > (and the draft wrote-up explicitly mentioned invalid reasoning > patterns) makes me worry that it will not be used in this way, but > will almost certainly be used immediately and enthusiastically in > invalid ways. And that this will produce a dangerous kind of > inference-rot at a very basic layer of the semantic web. > >> Here's another use case, to illustrate this: Consider a party (say, >> our bank) that knows it has partial information stored in an RDF >> graph -- e.g., some social information (say, the grandmother's >> maiden name) is only associated with some of the subjects (say, of >> class account holder) in the graph. The party needs to collect this >> information for all subjects of class account holder (say, due to >> stricter money laundering legislation). UNSAID enables the bank to >> acquire the missing information from those account holders for which >> it is needed, and later on also enables sanctions against account >> holders who do not provide it. > > > That is an excellent use case, I agree: using UNSAID to find out what > is not said. I wish they were all like this. But is UNSAID really > necessary for this? Or is it only a convenience? If it were possible > to handle cases like this without using UNSAID explicitly, I would > prefer that SPARQL require users to use a workaround. > >> > If SPARQL contains UNSAID then it will be inconsistent with any >> > account of meaning which is based on the RDF/RDFS/OWL normative >> > semantics. This will not render SPARQL unusable, but it will place it >> > outside the 'semantic web layer cake' and probably lead to the >> > eventual construction of a different, and rival, query language for >> > use by Web reasoners. >> >> Conversely, standardization of a too restricted version of SPARQL >> (e.g., one without UNSAID) will drive applications to either >> competing query languages, or to incompatible extensions that >> provide the expressivity they need. > > > That would be a better outcome, IMO, than having an RDF query language > in widespread use which would weaken the inferential foundations of > much of the semantic web. If the basic RDF protocols do not respect > the RDF semantics, then there really is no point in continuing with > the semantic web effort. > >> Note that this risk is not created by specifying a full version of >> SPARQL, including UNSAID, and by additionally profiling some subset >> of it that satisfies whatever assumptions you want to be able to >> make. > > > In an ideal world, everyone would read all the warnings in the spec > and obey them rationally. However, a spec designer has to consider the > real world. For example, it would be quite rational to allow blank > nodes in query patterns; but we find in practice that if they are > allowed, the people often misuse them, or expect them to apply in ways > that cannot be supported, or confuse them with query variables. So it > is simpler, and better, to just not allow them, even though in some > cases that requires users to express themselves more obliquely and use > work-arounds. I feel strongly that UNSAID is in this category of a > useful-if-you-know-exactly-how but dangerous-and-easy-to-misuse kind > of a feature, and one that is better omitted than included. And I feel > this way even more strongly when the email thread that suggested it, > and the draft write-up of the language feature itself, both misuse it > in exactly the dangerous way. > > Pat > >> Regards, >> -- >> Thomas Roessler, W3C <tlr@w3.org> > > > >-- > > > --------------------------------------------------------------------- > IHMC (850)434 8903 or (650)494 3973 home > 40 South Alcaniz St. (850)202 4416 office > Pensacola (850)202 4440 fax > FL 32502 (850)291 0667 cell > phayes@ihmc.us http://www.ihmc.us/users/phayes -- Bob MacGregor Chief Scientist Siderean Software Inc 390 North Sepulveda Blvd., Suite 2070 <http://maps.yahoo.com/py/maps.py?Pyt=Tmap&addr=5155+Rosecrans+Ave&csz=Hawthorne%2C+Ca+90250&country=us> El Segundo, CA 90245 bmacgregor@siderean.com <mailto:bmacgregor@siderean.com> tel: +1-310 647-4266 fax: +1-310-647-3470
Received on Tuesday, 21 December 2004 16:03:35 UTC