Re: UNSAID drafted and mapped to SQL from Bob MacGregor on 2004-12-21 (public-rdf-dawg-comments@w3.org from December 2004)

From: Bob MacGregor <bmacgregor@siderean.com>
Date: Tue, 21 Dec 2004 08:02:34 -0800
To: Pat Hayes <phayes@ihmc.us>
CC: bgrosof@mit.edu, "Eric Prud'hommeaux" <eric@w3.org>, public-rdf-dawg-comments@w3.org
Message-ID: <41C8491A.1060503@siderean.com>
This issue of UNSAID is really much bigger than SPARQL -- it affects a 
significant
fraction of Web users.  If my understanding is correct, all of the 
RuleML users
will be taking the Deductive Database / SQL mindset,  which assumes 
closed world
semantics.

The key part of the discussion involves whether or not a system can 
trust/assume
that closed world semantics applies to some graph / RDF dataset.  While its
certainly the case that nothing in RDF sanctions that assumption, only a 
small
amount of machinery is needed,  namely; named graphs and a single predicate
that can assert that a particular graph has closed-world semantics.  
Possibly,
the predicate would be more specific, asserting that a particular property
(and its subproperties) are closed within the named graph.

On the other hand, until named graphs become officially blessed (instead 
of just
something that everyone recognizes would be a major step forward) this
solution might not be viable for SPARQL.  On the other hand, our 
implementation
of SPARQL includes UNSAID already -- no sense in waiting around for
what will inevitably arrive sooner or later..

Cheers, Bob

Pat Hayes wrote:

>> On 2004-12-18 21:58:34 -0800, Pat Hayes wrote, at
>> http://lists.w3.org/Archives/Public/public-rdf-dawg/2004OctDec/0534.html:
>>
>> > The message that started the thread
>> > 
>> http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2004Nov/0016.html
>> > has an example that illustrates the point in its use case 2, the
>> > financial institution that must not send its prospectus to
>> > customers in the US or Canada. For this institution to rely on an
>> > UNSAID query to ensure this rule was obeyed would be very risky,
>> > since in general the RDF content against which the query is being
>> > evaluated is not known to be complete with regard to citizenship
>> > information. It cannot be so known, except by special access to
>> > off-web information, as there are currently no Web protocols for
>> > communicating the fact that a source is complete in this way.
>>
>> Indeed.  The same applies to the truthfulness of the information
>> contained in the RDF graph, or to the trustworthyness of information
>> about the graph's truthfulness that's transmitted inside the
>> protocol.  That's, obviously, not a reason to declare RDF and SPARQL
>> "very risky", and to drop them.
>
>
> There is a key difference, however. If an agency publishes some 
> RDF/OWL content which asserts that, say, Joe is an American citizen, 
> then the specs do indeed establish that they are asserting this, so 
> questions of trustworthiness and responsibility for published claims 
> can be brought into the area of rational discussion. This does not 
> apply to negation-as-failure.  If I publish some RDF/OWL which 
> describes some facts about citizenship but which fails to mention that 
> Joe is an American citizen, the specs insist that I have/ not/ thereby 
> asserted that Joe is not a citizen. If you draw that conclusion,/ you/ 
> do so at your risk, and I, the publisher, cannot be held responsible 
> for any consequences of that inferential act by you. It would be a 
> dangerous (IMO) mistake for SPARQL to imply in its design that this 
> kind of (negation-by-failure) inference was intended or meant to be 
> supported by an RDF or OWL reasoner; it could (will) be used to 
> deflect responsibility for mistakes to the wrong agency.
>
>> The point of applying UNSAID in the way described in use case 2 is,
>> precisely, that the graph that's queried is assumed to be
>
>> sufficiently complete for the querying party's purposes.
>
>
> But that assumption is invisible on the semantic web. My point is that 
> there is no way for a software agent to be told that a graph is 
> 'sufficiently complete' in the required sense. (No way to transmit 
> that using http, if you like.) And recall that the intended goal of 
> the semantic web it to allow/ software agents/ to make rational 
> decisions. If a designer really wants to use this kind of reasoning on 
> a source that it knows to be complete, I believe it is quite easy to 
> do so without having UNSAID in the querying protocol. For example, the 
> application can explicitly query for the rejection case and reject the 
> instance if it finds the relevant triple; then it has performed an 
> invalid inference, but has done so by using valid protocols . My 
> quarrel is not with the reasoning strategy (though I have my doubts 
> about it) but with the incorporation of an invalid reasoning process 
> into the querying protocols.
>
> A related matter. UNSAID refers simply to the absence of a triple. But 
> RDF supports entailment of triples by other triples, and such 
> entailments become quite complex in RDFS and extremely complex in OWL; 
> and RDF/XML is required by the various W3C WG charters to be the 
> interchange syntax for these more complex languages. Suppose an 
> OWL/RDF or RDFS triple store does not contain a certain triple, but 
> that triple can be inferred by valid OWL or RDFS reasoning from 
> triples that it does contain. In this case, a reasoner that relied on 
> UNSAID to implement negation-by-failure would become logically 
> incoherent, not merely mistaken: quite simple inputs would cause it to 
> become enmeshed in contradictions. (It might be better to have 
> something like UNIMPLIED rather than UNSAID, particularly as an RDF 
> graph can be reasonably taken to be 'saying' any RDF-valid consequence 
> of itself. )
>
>>  The
>> judgment whether or not this kind of assumption is "very risky"
>> (whatever this means) is not the protocol designer's to make, but
>> strictly a business decision made by the party that applies the
>
>> protocol.
>
>
> The anticipated uses of SW technology require such decisions to be 
> made by software, not by designers of software. Right now there is no 
> way to transmit the necessary information to a piece of software. (I 
> wish there were: the lack of this ability is a notable failure of the 
> RDF/OWL effort, I now think, for which I must bear part of the 
> responsibility.)
>
>> In fact, the word "complete" is ambiguous here: While a graph may be
>> incomplete, in the sense that it lacks facts that are out there
>> (this is the notion of "incompleteness" that you apply to use case
>> 2),
>
>
> Lacks a particular kind of fact. I agree that the notion of 
> 'completeness' here is ambiguous; that is part of the technical problem.
>
>> the same graph may quite well be the querying party's complete
>> knowledge of facts at some point of time.  In this context, UNSAID
>> also serves to help a party know what it does not know.
>
>
> I agree that is a potentially useful thing to be able to query. 
> However, the very fact that your use cases relied on invalid reasoning 
> (and the draft wrote-up explicitly mentioned invalid reasoning 
> patterns) makes me worry that it will not be used in this way, but 
> will almost certainly be used immediately and enthusiastically in 
> invalid ways. And that this will produce a dangerous kind of 
> inference-rot at a very basic layer of the semantic web.
>
>> Here's another use case, to illustrate this: Consider a party (say,
>> our bank) that knows it has partial information stored in an RDF
>> graph -- e.g., some social information (say, the grandmother's
>> maiden name) is only associated with some of the subjects (say, of
>> class account holder) in the graph. The party needs to collect this
>> information for all subjects of class account holder (say, due to
>> stricter money laundering legislation). UNSAID enables the bank to
>> acquire the missing information from those account holders for which
>> it is needed, and later on also enables sanctions against account
>> holders who do not provide it.
>
>
> That is an excellent use case, I agree: using UNSAID to find out what 
> is not said. I wish they were all like this. But is UNSAID really 
> necessary for this? Or is it only a convenience? If it were possible 
> to handle cases like this without using UNSAID explicitly, I would 
> prefer that SPARQL require users to use a workaround.
>
>> > If SPARQL contains UNSAID then it will be inconsistent with any
>> > account of meaning which is based on the RDF/RDFS/OWL normative
>> > semantics. This will not render SPARQL unusable, but it will place it
>> > outside the 'semantic web layer cake' and probably lead to the
>> > eventual construction of a different, and rival, query language for
>> > use by Web reasoners.
>>
>> Conversely, standardization of a too restricted version of SPARQL
>> (e.g., one without UNSAID) will drive applications to either
>> competing query languages, or to incompatible extensions that
>> provide the expressivity they need.
>
>
> That would be a better outcome, IMO, than having an RDF query language 
> in widespread use which would weaken the inferential foundations of 
> much of the semantic web. If the basic RDF protocols do not respect 
> the RDF semantics, then there really is no point in continuing with 
> the semantic web effort.
>
>> Note that this risk is not created by specifying a full version of
>> SPARQL, including UNSAID, and by additionally profiling some subset
>> of it that satisfies whatever assumptions you want to be able to
>> make.
>
>
> In an ideal world, everyone would read all the warnings in the spec 
> and obey them rationally. However, a spec designer has to consider the 
> real world. For example, it would be quite rational to allow blank 
> nodes in query patterns; but we find in practice that if they are 
> allowed, the people often misuse them, or expect them to apply in ways 
> that cannot be supported, or confuse them with query variables. So it 
> is simpler, and better, to just not allow them, even though in some 
> cases that requires users to express themselves more obliquely and use 
> work-arounds. I feel strongly that UNSAID is in this category of a 
> useful-if-you-know-exactly-how but dangerous-and-easy-to-misuse kind 
> of a feature, and one that is better omitted than included. And I feel 
> this way even more strongly when the email thread that suggested it, 
> and the draft write-up of the language feature itself, both misuse it 
> in exactly the dangerous way.
>
> Pat
>
>> Regards,
>> --
>> Thomas Roessler, W3C   <tlr@w3.org>
>
>
>
>-- 
>  
>
> ---------------------------------------------------------------------
> IHMC               (850)434 8903 or (650)494 3973   home
> 40 South Alcaniz St.       (850)202 4416   office
> Pensacola                 (850)202 4440   fax
> FL 32502                     (850)291 0667    cell
> phayes@ihmc.us       http://www.ihmc.us/users/phayes


-- 

Bob MacGregor
Chief Scientist

	
	Siderean Software Inc
390 North Sepulveda Blvd., Suite 2070
<http://maps.yahoo.com/py/maps.py?Pyt=Tmap&addr=5155+Rosecrans+Ave&csz=Hawthorne%2C+Ca+90250&country=us> 
El Segundo, CA 90245
bmacgregor@siderean.com <mailto:bmacgregor@siderean.com> 	
tel: 	+1-310 647-4266
fax: 	+1-310-647-3470
Received on Tuesday, 21 December 2004 16:03:35 UTC