RE: Concise Bounded Descriptions - updated, expanded, stand-alone definition from Karsten Otto on 2004-08-20 (www-rdf-interest@w3.org from August 2004)

From: Karsten Otto <otto@math.fu-berlin.de>
Date: Fri, 20 Aug 2004 18:13:59 +0200 (CEST)
To: Patrick.Stickler@nokia.com
cc: www-rdf-interest@w3.org
Message-ID: <Pine.LNX.4.56.0408201750480.3455@hobbes.mi.fu-berlin.de>
Hello,

thank you for clarifying this issue and explaining your position.
At the moment I only use N3 encoded CBDs for manually inspecting a
local RDF graph; as this contains all information about the resource in
question and enough "linking information" to explore its connections,
this is exactly what I need for debugging purposes.

However I don't use IFPs, so I cannot provide a good use case for the
issue right now. If I encounter this problem in the future I'll get back
to you ;-)

Still one thing isn't clear to me, maybe you could explain it: Lets assume
an agent found some IFP qualified blank node in a CBD, and determines it
would like more information about it. This means that the agent should
query for a CDB of the blank node. How does this fit into the URIQA API,
namely MGET? Also, what is the metadata authority for a blank node?

Regards,
Karsten Otto

(original message follows)

On Fri, 20 Aug 2004 Patrick.Stickler@nokia.com wrote:

> > On Fri, 20 Aug 2004 Patrick.Stickler@nokia.com wrote:
> >
> > > > > A draft of an updated, expanded, stand-alone definition
> > for Concise
> > > > > Bounded Descriptions is now available
> > > > >
> > > > >   http://swdev.nokia.com/uriqa/CBD.html
> > > > >
> > > > [snip]
> > > >
> > > > Great to have this on its own page as a point of reference!
> > > > However, I have a problem with the new concept of the inverse
> > > > functional
> > > > bounded description: It requires that both the sending
> > and receiving
> > > > agents are schema/ontology-aware, and also that they
> > share the same
> > > > schema/onology-knowledge, in order to correctly create
> > and interpret a
> > > > CBD.
> > > >
> > > > For once, the sender needs to know that a given predicate is an
> > > > owl:InverseFunctionalProperty, so it can pick the
> > "if"-branch of the
> > > > IFBD definition for an anonymous resource. However, this knowledge
> > > > may not always be available, e.g. in case of a simple semantic web
> > > > crawler. AFAIK the issue of finding all schemata/ontologies
> > > > for a given
> > > > RDF graph is not solved in general yet - or is it?
> > >
> > > If the sending agent is not aware that a given property is
> > > an owl:InverseFunctionalProperty, then it proceeds as if it
> > > is not.
> > >
> > Ok, seems like I interpreted too much into the definition. I
> > took it as
> > a MUST, and without getting to philosophical, an IFP *is* an
> > IFP wheteher
> > I know it or not. But from your reply I gather it should read
> > something like
> > "where the predicate *is known to be* an
> > owl:InverseFunctionalProperty".
> >
> > > Then again, if the agent has no knowledge about the property,
> > > it (ideally) would be able to submit a URIQA request to the
> > > metadata authority and obtain the information it needs.
> > >
> > Yes, this makes sense for a more complex agent. But I was
> > thinking of a
> > simpler case, such as a passive RDF database with a frontend
> > for answering
> > queries with CBDs (e.g. via URIQA :-)
> >
> > Also, what is the "metadata authority" you mention?
>
> Whomever controls the response to a URIQA request (e.g. MGET)
> (per the web authority of the URI). So, for e.g. the resource
> denoted by http://www.example.com/blargh the web authority
> is www.example.com and thus a description from www.example.com
> is the authoritative description (as opposed to, e.g. a description
> obtained from any other source, even if via the URIQA servlet
> API, e.g. http://www.google.com/uriqa?uri=http://www.example.com/blargh.
>
> >
> > > However, the definition/generation of CBDs still works just
> > > fine if such information is not available -- in fact, it is
> > > then the same as the original definition of CBDs.
> > >
> > > > Furthermore, the receiver also needs to know that a given
> > predicate
> > > > is an IFP. This is a more serious issue, as it needs this
> > to determine
> > > > whether the "if"- or the "else"-branch of the IFBD definition
> > > > was picked
> > > > by the sender. In the "else" case, it already has all known
> > > > statements,
> > > > but in the "if" case it might need to issue another query
> > (by IFP).
> > > >
> > > > Consequently, if the IFP is unknown to the receiver, it
> > might falsely
> > > > conclude that it already got all information the sender had on the
> > > > resource.
> > >
> > > Well, I think it is fair to presume that if the recieving agent
> > > is going to do anything particularly useful with (i.e. make
> > decisions
> > > based on) the recieved knowledge, that it will have to be aware
> > > of the vocabularies/ontologies in which that knowledge is expressed.
> > >
> > Agreed. However, the open nature of RDF implies that an agent does not
> > need to understand every statement in a graph. If a receiver does not
> > understand the IFP the sender used to "prune" the CDB, it will mistake
> > the pruned graph for the whole thing. IMHO there should be a way to
> > distinguish the two cases.
>
> I appreciate your point. I'm just not convinced that the subgraph
> returned as a CBD needs to contain such process-specific knowledge.
>
> E.g. one could include a statement
>
>    ?x rdf:type cbd:InverseFunctionalPropertyDistinguished .
>
> or some such triple to indicate which anonymous nodes are
> uniquely distinguished by inverse functional properties and
> which are not.
>
> Or one could include, as you suggested, statements about the
> properties themselves.
>
> But I'd need to see a strongly motivating use case for doing
> something like that. I.e. the goal is to keep CBDs as, er,
> "concise" as possible, so any knowledge to be included needs
> to fight hard to win a place in a CBD.
>
> Since I envision an semantic web where agents can obtain
> authoritative CBDs via dereferencable URIs, the inclusion of
> information such as above is not IMO sufficiently justified
> (in the long term, at least).
>
> >
> > > Also, and again, I personally do not see CBDs as a complete
> > > solution to knowledge interchange between semantic web agents.
> > > Something equivalent or comparable to URIQA must also exist so
> > > that agents can further obtain the knowledge they need.
> > >
> > Of course. But my point is that the receiver cannot know that it
> > needs to send another query in the problematic case. In fact, as it
> > does not know the relevant IFP, it does not even have the necessary
> > parametes for the query.
>
> It does not have to know the IFPs. It can use the triples it has
> been given as the template. If there are IFPs, then all of those
> triples will include IFPs. If there are no IFPs, then none will
> include IFPs, and the query may identify more than one resource.
>
> Still, it would make more sense to me for the agent to ask about
> the properties it has never seen before, expanding its knowledge
> accordingly, before trying to use a half-blind brute force
> series of queries to extract additional knowledge about
> anonymous node denoted resources.
>
> >
> > > > I see two possible solutions to this problem: The CBD
> > could contain
> > > > the relevant "ppp rdf:type owl:InverseFunctionalProperty"
> > statements,
> > > > or indicate all relevant ontologies by way of owl:includes.
> > > > However, neither solution is viable for RDF-only cases, such as
> > > > querying the aforementioned simple spider agent.
> > >
> > > Or, if the recieving agent has no knowledge about those properties,
> > > it can either submit a URIQA MGET request, or ask the same source
> > > of that knowledge for additional knowledge about those properties.
> > > I.e., ask the sending agent what it knows about those properties
> > > (by sending the CBD of each, etc.)
> > >
> > > I see the definition of CBDs as a componenent of a general
> > > bootstrapping mechanism for the semantic web, not as an all
> > > encompassing solution to knowledge interchange.
> > >
> > Yes, and I dont expect them to be anything else. But I was thinking of
> > a scenario where the receiving agent has limited resources (memory,
> > bandwidth, CPU power), for example because it resides on a
> > tiny embedded
> > device. For that reason it cannot cache all ontologies it
> > might encounter,
> > or ask for the precise definition of everything it finds. But it can
> > answer simple queries on the triple level, like "find me things of
> > rdf:type dev:Printer with foo:location bar:Room5". The lookup service
> > has some _:p in its database that matches these criteria, but also has
> > the IFP comp:uniqueDeviceNumber. The agent does not know the comp
> > ontology, so it does not know the information about _:p is pruned, and
> > would match if another query were formed. (Sorry for this
> > hasty example)
>
> I certainly am sympathetic to agents running in limited environments
> (after all, I'm a semantic web researcher working for Nokia ;-)
> but again, my experience has been that applications that deal with
> knowledge expressed with ontologies employing IFPs are aware of which
> IFPs are important -- and even more so, are highly selective of
> knowledge which syncs with their own limited vocabularies and disregard
> the rest (so if the embedded agent doesn't already know it's an IFP,
> it doesn't care and won't bother about it anyway).
>
> Again, I appreciate your point, but would like to see some hard
> and real use cases and experience demonstrate the need, rather
> than expanding the scope of CBDs merely "on a hunch" or as a matter
> of esthetics or "just in case" arguments.
>
> >
> > > > By the way this seems to be a more general case of the
> > "crossing layer
> > > > boundaries"-problem previously discussed (but not solved)
> > in another
> > > > mailing list thread [1].
> > >
> > > Well, given the examples you present in that referenced document,
> > > I would say that those "missing triples" are provided for by the
> > > closure rules defined in the RDF model theory. Triples that can be
> > > inferred are not the same as triples which are simply not included
> > > in a graph, but must be obtained separately.
> > >
> > Well, I think the cases are similar in that there the
> > receiver is supposed
> > to understand rdfs:subClassOf etc, where here it is supposed
> > to understand
> > owl:InverseFunctionalProperty. If the receiver does not
> > understand these,
> > it will fail to process the information in the way the sender
> > intended.
>
> That is a much broader issue and a general challenge for achieving
> a critical mass of deployed semantic web solutions.
>
> I don't see that the definition of CBDs directly helps or hinders
> that problem (though I see that URIQA most certainly would help
> substantially).
>
> Thanks for the engaging questions. I hope you don't feel I'm in
> any way blowing you off and not taking your points seriously. I'm
> simply not convinced that there is a critical problem that would
> be solved by changing the definition of CBDs as opposed to other
> approaches/solutions.
>
> Cheers,
>
> Patrick
>
Received on Friday, 20 August 2004 16:14:03 UTC