Re: UNSAID drafted and mapped to SQL from Pat Hayes on 2004-12-20 (public-rdf-dawg-comments@w3.org from December 2004)

From: Pat Hayes <phayes@ihmc.us>
Date: Mon, 20 Dec 2004 10:12:02 -0800
To: Thomas Roessler <tlr@w3.org>
Cc: Giles Hogben <giles.hogben@jrc.it>, Rigo Wenning <rigo@w3.org>, Eric Prud'hommeaux <eric@w3.org>, public-rdf-dawg-comments@w3.org
Message-Id: <p06001f25bdecbb263964@[192.168.1.7]>
>On 2004-12-18 21:58:34 -0800, Pat Hayes wrote, at
>http://lists.w3.org/Archives/Public/public-rdf-dawg/2004OctDec/0534.html:
>
>>  The message that started the thread
>> 
>>http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2004Nov/0016.html
>>  has an example that illustrates the point in its use case 2, the
>>  financial institution that must not send its prospectus to
>>  customers in the US or Canada. For this institution to rely on an
>>  UNSAID query to ensure this rule was obeyed would be very risky,
>>  since in general the RDF content against which the query is being
>>  evaluated is not known to be complete with regard to citizenship
>>  information. It cannot be so known, except by special access to
>>  off-web information, as there are currently no Web protocols for
>>  communicating the fact that a source is complete in this way.
>
>Indeed.  The same applies to the truthfulness of the information
>contained in the RDF graph, or to the trustworthyness of information
>about the graph's truthfulness that's transmitted inside the
>protocol.  That's, obviously, not a reason to declare RDF and SPARQL
>"very risky", and to drop them.

There is a key difference, however. If an agency publishes some 
RDF/OWL content which asserts that, say, Joe is an American citizen, 
then the specs do indeed establish that they are asserting this, so 
questions of trustworthiness and responsibility for published claims 
can be brought into the area of rational discussion. This does not 
apply to negation-as-failure.  If I publish some RDF/OWL which 
describes some facts about citizenship but which fails to mention 
that Joe is an American citizen, the specs insist that I have not 
thereby asserted that Joe is not a citizen. If you draw that 
conclusion, you do so at your risk, and I, the publisher, cannot be 
held responsible for any consequences of that inferential act by you. 
It would be a dangerous (IMO) mistake for SPARQL to imply in its 
design that this kind of (negation-by-failure) inference was intended 
or meant to be supported by an RDF or OWL reasoner; it could (will) 
be used to deflect responsibility for mistakes to the wrong agency.

>The point of applying UNSAID in the way described in use case 2 is,
>precisely, that the graph that's queried is assumed to be
>sufficiently complete for the querying party's purposes.

But that assumption is invisible on the semantic web. My point is 
that there is no way for a software agent to be told that a graph is 
'sufficiently complete' in the required sense. (No way to transmit 
that using http, if you like.) And recall that the intended goal of 
the semantic web it to allow software agents to make rational 
decisions. If a designer really wants to use this kind of reasoning 
on a source that it knows to be complete, I believe it is quite easy 
to do so without having UNSAID in the querying protocol. For example, 
the application can explicitly query for the rejection case and 
reject the instance if it finds the relevant triple; then it has 
performed an invalid inference, but has done so by using valid 
protocols . My quarrel is not with the reasoning strategy (though I 
have my doubts about it) but with the incorporation of an invalid 
reasoning process into the querying protocols.

A related matter. UNSAID refers simply to the absence of a triple. 
But RDF supports entailment of triples by other triples, and such 
entailments become quite complex in RDFS and extremely complex in 
OWL; and RDF/XML is required by the various W3C WG charters to be the 
interchange syntax for these more complex languages. Suppose an 
OWL/RDF or RDFS triple store does not contain a certain triple, but 
that triple can be inferred by valid OWL or RDFS reasoning from 
triples that it does contain. In this case, a reasoner that relied on 
UNSAID to implement negation-by-failure would become logically 
incoherent, not merely mistaken: quite simple inputs would cause it 
to become enmeshed in contradictions. (It might be better to have 
something like UNIMPLIED rather than UNSAID, particularly as an RDF 
graph can be reasonably taken to be 'saying' any RDF-valid 
consequence of itself. )

>  The
>judgment whether or not this kind of assumption is "very risky"
>(whatever this means) is not the protocol designer's to make, but
>strictly a business decision made by the party that applies the
>protocol.

The anticipated uses of SW technology require such decisions to be 
made by software, not by designers of software. Right now there is no 
way to transmit the necessary information to a piece of software. (I 
wish there were: the lack of this ability is a notable failure of the 
RDF/OWL effort, I now think, for which I must bear part of the 
responsibility.)

>In fact, the word "complete" is ambiguous here: While a graph may be
>incomplete, in the sense that it lacks facts that are out there
>(this is the notion of "incompleteness" that you apply to use case
>2),

Lacks a particular kind of fact. I agree that the notion of 
'completeness' here is ambiguous; that is part of the technical 
problem.

>the same graph may quite well be the querying party's complete
>knowledge of facts at some point of time.  In this context, UNSAID
>also serves to help a party know what it does not know.

I agree that is a potentially useful thing to be able to query. 
However, the very fact that your use cases relied on invalid 
reasoning (and the draft wrote-up explicitly mentioned invalid 
reasoning patterns) makes me worry that it will not be used in this 
way, but will almost certainly be used immediately and 
enthusiastically in invalid ways. And that this will produce a 
dangerous kind of inference-rot at a very basic layer of the semantic 
web.

>Here's another use case, to illustrate this: Consider a party (say,
>our bank) that knows it has partial information stored in an RDF
>graph -- e.g., some social information (say, the grandmother's
>maiden name) is only associated with some of the subjects (say, of
>class account holder) in the graph. The party needs to collect this
>information for all subjects of class account holder (say, due to
>stricter money laundering legislation). UNSAID enables the bank to
>acquire the missing information from those account holders for which
>it is needed, and later on also enables sanctions against account
>holders who do not provide it.

That is an excellent use case, I agree: using UNSAID to find out what 
is not said. I wish they were all like this. But is UNSAID really 
necessary for this? Or is it only a convenience? If it were possible 
to handle cases like this without using UNSAID explicitly, I would 
prefer that SPARQL require users to use a workaround.

>  > If SPARQL contains UNSAID then it will be inconsistent with any
>>  account of meaning which is based on the RDF/RDFS/OWL normative
>>  semantics. This will not render SPARQL unusable, but it will place it
>>  outside the 'semantic web layer cake' and probably lead to the
>>  eventual construction of a different, and rival, query language for
>>  use by Web reasoners.
>
>Conversely, standardization of a too restricted version of SPARQL
>(e.g., one without UNSAID) will drive applications to either
>competing query languages, or to incompatible extensions that
>provide the expressivity they need.

That would be a better outcome, IMO, than having an RDF query 
language in widespread use which would weaken the inferential 
foundations of much of the semantic web. If the basic RDF protocols 
do not respect the RDF semantics, then there really is no point in 
continuing with the semantic web effort.

>Note that this risk is not created by specifying a full version of
>SPARQL, including UNSAID, and by additionally profiling some subset
>of it that satisfies whatever assumptions you want to be able to
>make.

In an ideal world, everyone would read all the warnings in the spec 
and obey them rationally. However, a spec designer has to consider 
the real world. For example, it would be quite rational to allow 
blank nodes in query patterns; but we find in practice that if they 
are allowed, the people often misuse them, or expect them to apply in 
ways that cannot be supported, or confuse them with query variables. 
So it is simpler, and better, to just not allow them, even though in 
some cases that requires users to express themselves more obliquely 
and use work-arounds. I feel strongly that UNSAID is in this category 
of a useful-if-you-know-exactly-how but dangerous-and-easy-to-misuse 
kind of a feature, and one that is better omitted than included. And 
I feel this way even more strongly when the email thread that 
suggested it, and the draft write-up of the language feature itself, 
both misuse it in exactly the dangerous way.

Pat

>Regards,
>--
>Thomas Roessler, W3C   <tlr@w3.org>


-- 
---------------------------------------------------------------------
IHMC		(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32502			(850)291 0667    cell
phayes@ihmc.us       http://www.ihmc.us/users/phayes
Received on Monday, 20 December 2004 18:13:02 UTC