Re: Extensibility: Fallback vs. Monolithic from Axel Polleres on 2007-06-28 (public-rif-wg@w3.org from June 2007)

From: Axel Polleres <axel.polleres@deri.org>
Date: Thu, 28 Jun 2007 12:14:16 +0100
To: Bijan Parsia <bparsia@cs.man.ac.uk>
Cc: Gary Hallmark <gary.hallmark@oracle.com>, Sandro Hawke <sandro@w3.org>, Dave Reynolds <der@hplb.hpl.hp.com>, public-rif-wg@w3.org
Message-id: <46839808.5010403@deri.org>
Bijan Parsia wrote:
> On 28 Jun 2007, at 10:07, Axel Polleres wrote:
> 
>>
>> Bijan Parsia wrote:
>>
>>> On Jun 27, 2007, at 11:56 AM, Axel Polleres wrote:
>>> [snip]
>>>
>>>> a) ignoring X will lead to sound inferences only but inferences
>>>>    might be incomplete
>>>> b) ignoring Y will lead preserve completeness but unsound inferences
>>>>    might arise
>>>> c) ignoring z will neither preserve soundness nor completeness
>>>>
>>>> etc.
>>>> while the latter two are probvably pointless,
>>>
>>> [snip]
>>> Since I don't know exactly what's being ignored, my conversations   
>>> with various users, esp. those working on biology, certainly  
>>> suggest  that B is to be preferred to A (i.e., they *really* don't  
>>> want to  miss answers, and in their case it's pretty easy to check  
>>> for  spuriousness, and, typically, the likelihood of false  positives 
>>> is low).
>>
>>
>> That depends on the use case.
> 
> 
> Obviously. I was pointing to the general characteristics based on  user 
> feedback. This suggests that B isn't *probably* (provably?)  pointless.
> 
>> If you have a use case for B, fair enough.
>> I just couldn't think of one, whereas I can think of various use  
>> cases for A.
> 
> 
> Sure, hence my pointing out that there are several.
> 
> Search engines in general seem to provide plenty of use cases. But  
> consider:
>     Sales leads..completeness might be better than soundness
>     Fraud detection...completeness might be better than soundness
>         Any sort of threat detection as long as the false positives 
> aren't  that bad, e.g., diagonosis
> Just about anything that you might verify afterwards. Also consider
>     Robot navigation...some answer fast is better than no answer as you  
> will have lots of correction
>     Stock purchasing...some answer fast may be better than a perfect  
> answer too late
>         There they will quantify better
>     Exploring known dirty data (all data is dirty)
>         There are arguments for all three plus s&c here depending on 
> the  specifics.
> Etc. etc. etc.
> 
>>> Similarly, if I just need *an* answer (not every answer) but I  need  
>>> it quickly, c could be fine as long as 1) the probability of  some  
>>> correct answer, if there is one, being in the answer set is  high
>>
>>
>> the higher this probability, the closer you get to A ;-)
> 
> 
> Not really. Think search engines. You may be willing to take a bit of  
> noise and missed answers so long as *a good enough* answer appears  and 
> is discernable by the person (i.e., you don't need the engine to  filter 
> out all the spurious answers).
> 
>> but since I didn't think about probabilistic extensions here yet...
> 
> 
> I'm not talking probabilistic extensions. I'm talking about how I, as  a 
> user, assess the utility of a proof procedure.
> 
>> I think then you'd rather need a fallback like:
>>
>> c') ignoring z will neither preserve soundness nor completeness,
>>     but preserve soundness with probability greater than <threshold>
> 
> 
> Well, I can trivially meet a) in many cases by ignoring *EVERYTHING*  
> and making no inferences at all. Clearly that's not so useful either.

*ggg* I was waiting for that one
same goes for B) when asserting *EVERYTHING* obviously... :-)))

we can agree that there are use cases for tboth a and b and that there 
are trivial corner cases for both.

>> anyway, if this threshold can't be named, I don't see good use cases.
> 
> Total fallacy. Just because I can't measure "exactly" or even roughly  
> with precision doesn't mean I can't make reasonable  assessments. Why  
> are these use cases bad?

I didn't say that the use cases are bad.
I am simply worried about what something like *good enough* means.
If it is not defined (if even informally in a description, let's say, 
which should be the minimal minimal requirement here), it is hard to 
use. Obviously, even a given "probability" is hard to assess, but the 
rationale behind why possibly unsound answers are good enough should be 
given, and ideally also why unsound answers do not have serious impact.
  As I said, this could be in a description, or referring to some 
document which explains the limitations or whatever, it doesn't need to 
be necessary formally checkable.

> I'll note that you didn't provide even the level of detail about your  
> use cases that I did. Your actual example is entirely nominal. (Of  
> course, that's fine because it's pretty easy to see your point.) I'm  
> unclear about why you are so dismissive of mine, 

I am not at all dismissive! Actually, I appreciate that there are use 
cases for B (still unsure bout C), just want to clarify things.

> esp. by appealing to a standard which you hadn't met or established 
> in this conversation  yet. 
> (And a standard which probably isn't needed at this stage of the  game.)



>>> and  2) the answer set isn't to big and 3) I can check for  
>>> correctness  well enough or the consequence of the occasional  wrong 
>>> answer is low.
>>
>>
>> ... as before: if "well enough" can't be qunatified, I feel a bit  
>> unease here.

let me retract "quantified" here to "described" in the sense of the 
above-said.

> It's no worse than with A, really. What if the missing answers are  
> critical? What if the *data* are bad so many of the sound answers are  
> actually bad as well, thus you need all of them (or maybe some that  
> aren't sound)?

Yes, it might be a good idea to also specify than for A in some way 
which are the inferences that you would loose.
(For my admittedly simple example, this would be that you possibly loose 
inferences of rules with negation and upwards in the dependency graph)

> Specific analysis plus testing is the usual way. And testing you have  
> to do because of bugs and bad data anyway.
> 
>>> And of course, if my overall probability of error due to  (inherent)  
>>> unsoundness or incompleteness plus the chance of a bug  in my 
>>> reasoner  is much less than the chance of a bug in an  inherently 
>>> sound and  complete reasoner, well, that's a reason to  prefer the 
>>> former.
>>> I imagine that life is easier all around if the ignoring is   
>>> standardized. It's probably a bit easier to explain to users that  
>>> the  system "ignores these bits and reasons with the rest" than to  
>>> explain  how some particular approximation technique works in  other 
>>> terms. Oh,  and clearly a is the easiest to explain because  of 
>>> examples like  yours. It's also easier to get interoperability  since 
>>> you can require  soundness and completeness for the pruned  document.
>>
>>
>> I think I agree, though I am admittedly not quite sure what is the  
>> concrete point you want to make here? :-)
> 
> 
> The concrete point is that soundness, completeness, and decidability  
> are useful metalogical properties of a system, esp for specification,  
> analysis of systems, and interoperability. But there are good cases  for 
> departing from all of them. 

great!

> The problem is that if you do this in  the engine, then specification, 
> analysis, and interop, plus  explaining to users gets harder. If you do 
> it in the document, i.e.,  generate 
> *documents* which are approximations of the original, then  run a s&c 
> engine on the approximations, users find that easier to  understand 
> overall, I believe.

Anyway, trying to sum up, it seems that the original thing in the 
message then still holds. I tmight be helpful, instead of only saying:

  If you ignore X then "something bad happens"

to distiguish into

  If you ignore X then you loose soundness

  If you ignore X then you loose completeness

and then allowing additional (possibly descriptive)
annotations which say for what cases you loose soundness or 
completeness, respectively

yes?

axel

p.s.: BTW, incomplete for a rule set/dialect, in some cases can imply 
unsoundness for rules or queries you add on top, especially in the 
search scenario, if you allow negation as failure in search queries, see 
[1], where we tried to nail this down a bit by the notion of "context 
monotonicity" ... just to put some self-citation ;-)

1. Axel Polleres, Cristina Feier, and Andreas Harth. Rules with 
contextually scoped negation. In Proceedings of the 3rd European 
Semantic Web Conference (ESWC2006), volume 4011 of Lecture Notes in 
Computer Science, Budva, Montenegro, June 2006. Springer.


-- 
Dr. Axel Polleres
email: axel@polleres.net  url: http://www.polleres.net/
Received on Thursday, 28 June 2007 11:14:33 UTC