Re: notes on use cases from Dave Reynolds on 2003-04-08 (www-rdf-dspace@w3.org from April 2003)

From: Dave Reynolds <der@hplb.hpl.hp.com>
Date: Tue, 08 Apr 2003 14:01:47 +0100
To: "Butler, Mark" <Mark_Butler@hplb.hpl.hp.com>
CC: "'SIMILE public list'" <www-rdf-dspace@w3.org>
Message-ID: <3E92C83B.A9FC9D11@hplb.hpl.hp.com>
"Butler, Mark" wrote:
> 
> > 3.2.6
> >
> > I am made nervous by the discussion of inferencing search.  This
> > formulation takes a tractable database problem and turns it into an
> > NP-hard or even undecidable computational nightmare.
> 
> I'm not an expert on Description Logics, but I thought the point of DL's
> were they ensured the problems were not NP-hard (although Jeremy Carroll,
> our OWL WG member here at HP has expressed some concern recently about
> whether this is true in the current version of OWL). Have I misunderstood
> something here?

Description Logics are indeed more tractable than full first order predicate
calculus, they are decidable where FOPC is undecidable. 

David is right that they can still be hard. Complete OWL-lite inference is, I
believe, EXPTIME (i.e. worse than NP) and OWL-DL is NEXPTIME (i.e. quite a bit
worse than that). However, For languages of the complexity of OWL-lite up to
nearly all of OWL-DL there are working systems that give effective answers in
short time on substantial datasets - i.e. there are demonstrations that they are
tractable in practice, even though the worst case complexity is horrible. This
is not true for the whole of OWL-DL - tractable implementation of that is a
research issue.

But ... for SIMILE there are lots of possible uses for inference and
corresponding methods of providing it that are entirely tractable.

First, I suspect a lot what you need for SIMILE will reduce to
subClass/subProperty processing, transitive closures and simple inheritance
style processing. These are typically polynomial time problems or at least
tractable in practice.

Second, beyond these core RDFS-style inferences, more can be achieved though
rule systems which are again polynomial complexity. Grosof, Horrocks et al have
demonstrated that an interesting fraction of DL constructs can be directly
translated to a rule-based formulation. This is one of the approaches we are
taking for our own OWL support in Jena2. Such rules can often by layered ontop
of a database.

Third, I suspect that many of the important inferences can be eagerly evaluated
and cached so that the search is over a fully materialized view rather than
having to do inference on the fly. There is always a space/time trade off but
there are important inferences (e.g. creating a concept taxonomy or concept
recognition) that have high reusability and low cost to store.

> I came up with what I think is a helpful analogy here: we can consider
> description logics and inferencing as a form of information compression i.e.
> we can reduce a set of triples to a compressed form (the ontology) and then
> we can retrieve the uncompressed version (the ontology and the entailments)
> via an algorithm. We perform the compression for two reasons: firstly to
> simplify the task of description, and secondly to reduce the amount of
> memory required for representation. 

Its the saving on human effort, simplifying the description task, which I think
is the critical value here. You are right that you could think of this as
information compression but the saving-memory aspect is a less convincing value.

Other potential values of inference include:
  o mapping from one form (ontology) to another,
  o ability to combine data from multiple sources and explore consequences 
    (if the consequences stem from interactions between the two sources then 
    neither source would have been able to supply the entailments in ground 
    form), 
  o ability to post-hoc extend a schema and have consequences automatically 
    created (see earlier discussion on open schemas),
  o ability to fill in the blanks in data (validation tells you something 
    is missing, inference may be able to fill in the missing fields, at 
    least enough to be able to work with the data).

Dave
Received on Tuesday, 8 April 2003 09:02:30 UTC