- From: Pat Hayes <phayes@ihmc.us>
- Date: Mon, 20 Dec 2004 10:12:02 -0800
- To: Thomas Roessler <tlr@w3.org>
- Cc: Giles Hogben <giles.hogben@jrc.it>, Rigo Wenning <rigo@w3.org>, Eric Prud'hommeaux <eric@w3.org>, public-rdf-dawg-comments@w3.org
- Message-Id: <p06001f25bdecbb263964@[192.168.1.7]>
>On 2004-12-18 21:58:34 -0800, Pat Hayes wrote, at >http://lists.w3.org/Archives/Public/public-rdf-dawg/2004OctDec/0534.html: > >> The message that started the thread >> >>http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2004Nov/0016.html >> has an example that illustrates the point in its use case 2, the >> financial institution that must not send its prospectus to >> customers in the US or Canada. For this institution to rely on an >> UNSAID query to ensure this rule was obeyed would be very risky, >> since in general the RDF content against which the query is being >> evaluated is not known to be complete with regard to citizenship >> information. It cannot be so known, except by special access to >> off-web information, as there are currently no Web protocols for >> communicating the fact that a source is complete in this way. > >Indeed. The same applies to the truthfulness of the information >contained in the RDF graph, or to the trustworthyness of information >about the graph's truthfulness that's transmitted inside the >protocol. That's, obviously, not a reason to declare RDF and SPARQL >"very risky", and to drop them. There is a key difference, however. If an agency publishes some RDF/OWL content which asserts that, say, Joe is an American citizen, then the specs do indeed establish that they are asserting this, so questions of trustworthiness and responsibility for published claims can be brought into the area of rational discussion. This does not apply to negation-as-failure. If I publish some RDF/OWL which describes some facts about citizenship but which fails to mention that Joe is an American citizen, the specs insist that I have not thereby asserted that Joe is not a citizen. If you draw that conclusion, you do so at your risk, and I, the publisher, cannot be held responsible for any consequences of that inferential act by you. It would be a dangerous (IMO) mistake for SPARQL to imply in its design that this kind of (negation-by-failure) inference was intended or meant to be supported by an RDF or OWL reasoner; it could (will) be used to deflect responsibility for mistakes to the wrong agency. >The point of applying UNSAID in the way described in use case 2 is, >precisely, that the graph that's queried is assumed to be >sufficiently complete for the querying party's purposes. But that assumption is invisible on the semantic web. My point is that there is no way for a software agent to be told that a graph is 'sufficiently complete' in the required sense. (No way to transmit that using http, if you like.) And recall that the intended goal of the semantic web it to allow software agents to make rational decisions. If a designer really wants to use this kind of reasoning on a source that it knows to be complete, I believe it is quite easy to do so without having UNSAID in the querying protocol. For example, the application can explicitly query for the rejection case and reject the instance if it finds the relevant triple; then it has performed an invalid inference, but has done so by using valid protocols . My quarrel is not with the reasoning strategy (though I have my doubts about it) but with the incorporation of an invalid reasoning process into the querying protocols. A related matter. UNSAID refers simply to the absence of a triple. But RDF supports entailment of triples by other triples, and such entailments become quite complex in RDFS and extremely complex in OWL; and RDF/XML is required by the various W3C WG charters to be the interchange syntax for these more complex languages. Suppose an OWL/RDF or RDFS triple store does not contain a certain triple, but that triple can be inferred by valid OWL or RDFS reasoning from triples that it does contain. In this case, a reasoner that relied on UNSAID to implement negation-by-failure would become logically incoherent, not merely mistaken: quite simple inputs would cause it to become enmeshed in contradictions. (It might be better to have something like UNIMPLIED rather than UNSAID, particularly as an RDF graph can be reasonably taken to be 'saying' any RDF-valid consequence of itself. ) > The >judgment whether or not this kind of assumption is "very risky" >(whatever this means) is not the protocol designer's to make, but >strictly a business decision made by the party that applies the >protocol. The anticipated uses of SW technology require such decisions to be made by software, not by designers of software. Right now there is no way to transmit the necessary information to a piece of software. (I wish there were: the lack of this ability is a notable failure of the RDF/OWL effort, I now think, for which I must bear part of the responsibility.) >In fact, the word "complete" is ambiguous here: While a graph may be >incomplete, in the sense that it lacks facts that are out there >(this is the notion of "incompleteness" that you apply to use case >2), Lacks a particular kind of fact. I agree that the notion of 'completeness' here is ambiguous; that is part of the technical problem. >the same graph may quite well be the querying party's complete >knowledge of facts at some point of time. In this context, UNSAID >also serves to help a party know what it does not know. I agree that is a potentially useful thing to be able to query. However, the very fact that your use cases relied on invalid reasoning (and the draft wrote-up explicitly mentioned invalid reasoning patterns) makes me worry that it will not be used in this way, but will almost certainly be used immediately and enthusiastically in invalid ways. And that this will produce a dangerous kind of inference-rot at a very basic layer of the semantic web. >Here's another use case, to illustrate this: Consider a party (say, >our bank) that knows it has partial information stored in an RDF >graph -- e.g., some social information (say, the grandmother's >maiden name) is only associated with some of the subjects (say, of >class account holder) in the graph. The party needs to collect this >information for all subjects of class account holder (say, due to >stricter money laundering legislation). UNSAID enables the bank to >acquire the missing information from those account holders for which >it is needed, and later on also enables sanctions against account >holders who do not provide it. That is an excellent use case, I agree: using UNSAID to find out what is not said. I wish they were all like this. But is UNSAID really necessary for this? Or is it only a convenience? If it were possible to handle cases like this without using UNSAID explicitly, I would prefer that SPARQL require users to use a workaround. > > If SPARQL contains UNSAID then it will be inconsistent with any >> account of meaning which is based on the RDF/RDFS/OWL normative >> semantics. This will not render SPARQL unusable, but it will place it >> outside the 'semantic web layer cake' and probably lead to the >> eventual construction of a different, and rival, query language for >> use by Web reasoners. > >Conversely, standardization of a too restricted version of SPARQL >(e.g., one without UNSAID) will drive applications to either >competing query languages, or to incompatible extensions that >provide the expressivity they need. That would be a better outcome, IMO, than having an RDF query language in widespread use which would weaken the inferential foundations of much of the semantic web. If the basic RDF protocols do not respect the RDF semantics, then there really is no point in continuing with the semantic web effort. >Note that this risk is not created by specifying a full version of >SPARQL, including UNSAID, and by additionally profiling some subset >of it that satisfies whatever assumptions you want to be able to >make. In an ideal world, everyone would read all the warnings in the spec and obey them rationally. However, a spec designer has to consider the real world. For example, it would be quite rational to allow blank nodes in query patterns; but we find in practice that if they are allowed, the people often misuse them, or expect them to apply in ways that cannot be supported, or confuse them with query variables. So it is simpler, and better, to just not allow them, even though in some cases that requires users to express themselves more obliquely and use work-arounds. I feel strongly that UNSAID is in this category of a useful-if-you-know-exactly-how but dangerous-and-easy-to-misuse kind of a feature, and one that is better omitted than included. And I feel this way even more strongly when the email thread that suggested it, and the draft write-up of the language feature itself, both misuse it in exactly the dangerous way. Pat >Regards, >-- >Thomas Roessler, W3C <tlr@w3.org> -- --------------------------------------------------------------------- IHMC (850)434 8903 or (650)494 3973 home 40 South Alcaniz St. (850)202 4416 office Pensacola (850)202 4440 fax FL 32502 (850)291 0667 cell phayes@ihmc.us http://www.ihmc.us/users/phayes
Received on Monday, 20 December 2004 18:13:02 UTC