RE: Concise Bounded Descriptions - updated, expanded, stand-alone definition from Patrick.Stickler@nokia.com on 2004-08-21 (www-rdf-interest@w3.org from August 2004)

From: <Patrick.Stickler@nokia.com>
Date: Sat, 21 Aug 2004 10:50:38 +0300
To: <otto@math.fu-berlin.de>
Cc: <www-rdf-interest@w3.org>
Message-ID: <A03E60B17132A84F9B4BB5EEDE57957B02A2E9EE@trebe006.europe.nokia.com>
> -----Original Message-----
> From: ext Karsten Otto [mailto:otto@math.fu-berlin.de]
> Sent: 20 August, 2004 19:14
> To: Stickler Patrick (Nokia-TP-MSW/Tampere)
> Cc: www-rdf-interest@w3.org
> Subject: RE: Concise Bounded Descriptions - updated, expanded,
> stand-alone definition
> 
> 
> Hello,
> 
> thank you for clarifying this issue and explaining your position.
> At the moment I only use N3 encoded CBDs for manually inspecting a
> local RDF graph; as this contains all information about the 
> resource in
> question and enough "linking information" to explore its connections,
> this is exactly what I need for debugging purposes.
> 
> However I don't use IFPs, so I cannot provide a good use case for the
> issue right now. If I encounter this problem in the future 
> I'll get back
> to you ;-)
> 
> Still one thing isn't clear to me, maybe you could explain 
> it: Lets assume
> an agent found some IFP qualified blank node in a CBD, and 
> determines it
> would like more information about it. This means that the agent should
> query for a CDB of the blank node. How does this fit into the 
> URIQA API,
> namely MGET? 


I'm very glad you asked this question, because I was planning on 
a followup post to my last one to address this important issue.

There are two key points I want to make:

1. The refinement of the CBD definition to take into account
   known IFPs is primarily intended to be a "sanity check" for
   the sending agent -- so that it does not inadvertantly
   dump most/all of its knowledge base when describing a single
   named resource.

   For those using FOAF, where most "significant" resources
   (persons, organizations, etc.) are denoted by anonymous
   nodes differentiated by IFPs, the revised definition is
   very important for efficiency.

2. If some resource is important enough that some agent may
   want to ask about it specifically, you should denote it
   using a URI. Period. 

   You don't necessarily have to use an http: URI, or even an
   HTTP dereferencable URI (though IMO you should, and it's
   really not that hard to do) but a guid: URI or a uuid:
   URI or a tag: URI or any kind of urn: URI will do fine.

   The miracle of OWL's IFPs and owl:sameAs is that even if
   everyone uses different "local" URIs to denote people,
   organizations, etc. it's straightforward to equate the
   different names so we know we're talking about the same
   thing (and for those who seem to have a particular aversion
   to having many URIs denote the same resource -- note that
   numerous anonymous nodes denoting the same resource are just
   as distinct/unique/non-mergable as different URIs, but
   just less useful/convenient for querying.

   IMO, it is non-optimal-practice to use vocabularies such
   as FOAF to describe resources such as persons and organizations
   and denote them with anonymous nodes. I'm a big fan of FOAF
   and admire the way it uses IFPs effectively, but please, even
   if you use FOAF, name the damn things with URIs! ;-)

   Then we don't have to employ round-about query tactics to
   get at them indirectly via IFPs.

   IMO the primary use of IFPs should serve the purpose of inferring 
   the denotational equivalence of two nodes, and should not be
   used as a primary query access solution for accessing a particular 
   anonymous node. Folks should avoid *relying* on IFPs to differentiate
   their resources. If resources are important enough to ask about
   and talk about in a broader context than just one system (or even
   then) give them URIs.
   
   To specifally answer your question above, to access an anonymous 
   node via IFPs, one cannot use URIQA to simply ask for a CBD,
   since such requests are based on URIs naming the resource in
   question. Rather, you have to employ a much more general and
   capable query solution (e.g. what the DAWG are working on)
   which puts much greater demands on both ends of the implementation.

   That's not to say that full RDF query solutions are not to be
   promoted and used, only that the practice of relying on IFPs
   to distinguish anonymous node denoted resources rather than using
   URIs to denote such resources borders IMO on "bad practice" and
   imposes those greater processing/query burdens since simple
   protocols such as URIQA cannot be used.

> Also, what is the metadata authority for a blank node?

Firstly, neither the definition of CBD nor URIQA address the idea
of metadata authority for any particular node in a graph. URIQA
defines the idea of metadata authority in terms of the web authority
component of an HTTP dereferencable URI, for those URIs that contain 
web authority components. Since one cannot use the URIQA protocol
to query about particular anonymous nodes, the question cannot be
answered insofar as URIQA or CBDs are concerned.

Secondly, and technically, I would presume that the metadata
authority for a particular anonymous node would be the originator
of the graph in which that node first occurs, but that's pretty
slippery and probably not very useful in practice.

This is another example of why "significant" resources should
always be named with URIs, because URIs also can provide a means
to differentiate between authoritative and third-party assertions
about that resource, whereas anonymous nodes are (in practice)
untraceable (i.e. unless alot of provenance infrastructure is added,
which tends to be non-portable in many cases anyway).

SUMMARY: Name everything with URIs and your life will be easier.

Cheers,

Patrick





> Regards,
> Karsten Otto
> 
> (original message follows)
> 
> On Fri, 20 Aug 2004 Patrick.Stickler@nokia.com wrote:
> 
> > > On Fri, 20 Aug 2004 Patrick.Stickler@nokia.com wrote:
> > >
> > > > > > A draft of an updated, expanded, stand-alone definition
> > > for Concise
> > > > > > Bounded Descriptions is now available
> > > > > >
> > > > > >   http://swdev.nokia.com/uriqa/CBD.html
> > > > > >
> > > > > [snip]
> > > > >
> > > > > Great to have this on its own page as a point of reference!
> > > > > However, I have a problem with the new concept of the inverse
> > > > > functional
> > > > > bounded description: It requires that both the sending
> > > and receiving
> > > > > agents are schema/ontology-aware, and also that they
> > > share the same
> > > > > schema/onology-knowledge, in order to correctly create
> > > and interpret a
> > > > > CBD.
> > > > >
> > > > > For once, the sender needs to know that a given 
> predicate is an
> > > > > owl:InverseFunctionalProperty, so it can pick the
> > > "if"-branch of the
> > > > > IFBD definition for an anonymous resource. However, 
> this knowledge
> > > > > may not always be available, e.g. in case of a simple 
> semantic web
> > > > > crawler. AFAIK the issue of finding all schemata/ontologies
> > > > > for a given
> > > > > RDF graph is not solved in general yet - or is it?
> > > >
> > > > If the sending agent is not aware that a given property is
> > > > an owl:InverseFunctionalProperty, then it proceeds as if it
> > > > is not.
> > > >
> > > Ok, seems like I interpreted too much into the definition. I
> > > took it as
> > > a MUST, and without getting to philosophical, an IFP *is* an
> > > IFP wheteher
> > > I know it or not. But from your reply I gather it should read
> > > something like
> > > "where the predicate *is known to be* an
> > > owl:InverseFunctionalProperty".
> > >
> > > > Then again, if the agent has no knowledge about the property,
> > > > it (ideally) would be able to submit a URIQA request to the
> > > > metadata authority and obtain the information it needs.
> > > >
> > > Yes, this makes sense for a more complex agent. But I was
> > > thinking of a
> > > simpler case, such as a passive RDF database with a frontend
> > > for answering
> > > queries with CBDs (e.g. via URIQA :-)
> > >
> > > Also, what is the "metadata authority" you mention?
> >
> > Whomever controls the response to a URIQA request (e.g. MGET)
> > (per the web authority of the URI). So, for e.g. the resource
> > denoted by http://www.example.com/blargh the web authority
> > is www.example.com and thus a description from www.example.com
> > is the authoritative description (as opposed to, e.g. a description
> > obtained from any other source, even if via the URIQA servlet
> > API, e.g. 
> http://www.google.com/uriqa?uri=http://www.example.com/blargh.
> >
> > >
> > > > However, the definition/generation of CBDs still works just
> > > > fine if such information is not available -- in fact, it is
> > > > then the same as the original definition of CBDs.
> > > >
> > > > > Furthermore, the receiver also needs to know that a given
> > > predicate
> > > > > is an IFP. This is a more serious issue, as it needs this
> > > to determine
> > > > > whether the "if"- or the "else"-branch of the IFBD definition
> > > > > was picked
> > > > > by the sender. In the "else" case, it already has all known
> > > > > statements,
> > > > > but in the "if" case it might need to issue another query
> > > (by IFP).
> > > > >
> > > > > Consequently, if the IFP is unknown to the receiver, it
> > > might falsely
> > > > > conclude that it already got all information the 
> sender had on the
> > > > > resource.
> > > >
> > > > Well, I think it is fair to presume that if the recieving agent
> > > > is going to do anything particularly useful with (i.e. make
> > > decisions
> > > > based on) the recieved knowledge, that it will have to be aware
> > > > of the vocabularies/ontologies in which that knowledge 
> is expressed.
> > > >
> > > Agreed. However, the open nature of RDF implies that an 
> agent does not
> > > need to understand every statement in a graph. If a 
> receiver does not
> > > understand the IFP the sender used to "prune" the CDB, it 
> will mistake
> > > the pruned graph for the whole thing. IMHO there should 
> be a way to
> > > distinguish the two cases.
> >
> > I appreciate your point. I'm just not convinced that the subgraph
> > returned as a CBD needs to contain such process-specific knowledge.
> >
> > E.g. one could include a statement
> >
> >    ?x rdf:type cbd:InverseFunctionalPropertyDistinguished .
> >
> > or some such triple to indicate which anonymous nodes are
> > uniquely distinguished by inverse functional properties and
> > which are not.
> >
> > Or one could include, as you suggested, statements about the
> > properties themselves.
> >
> > But I'd need to see a strongly motivating use case for doing
> > something like that. I.e. the goal is to keep CBDs as, er,
> > "concise" as possible, so any knowledge to be included needs
> > to fight hard to win a place in a CBD.
> >
> > Since I envision an semantic web where agents can obtain
> > authoritative CBDs via dereferencable URIs, the inclusion of
> > information such as above is not IMO sufficiently justified
> > (in the long term, at least).
> >
> > >
> > > > Also, and again, I personally do not see CBDs as a complete
> > > > solution to knowledge interchange between semantic web agents.
> > > > Something equivalent or comparable to URIQA must also exist so
> > > > that agents can further obtain the knowledge they need.
> > > >
> > > Of course. But my point is that the receiver cannot know that it
> > > needs to send another query in the problematic case. In 
> fact, as it
> > > does not know the relevant IFP, it does not even have the 
> necessary
> > > parametes for the query.
> >
> > It does not have to know the IFPs. It can use the triples it has
> > been given as the template. If there are IFPs, then all of those
> > triples will include IFPs. If there are no IFPs, then none will
> > include IFPs, and the query may identify more than one resource.
> >
> > Still, it would make more sense to me for the agent to ask about
> > the properties it has never seen before, expanding its knowledge
> > accordingly, before trying to use a half-blind brute force
> > series of queries to extract additional knowledge about
> > anonymous node denoted resources.
> >
> > >
> > > > > I see two possible solutions to this problem: The CBD
> > > could contain
> > > > > the relevant "ppp rdf:type owl:InverseFunctionalProperty"
> > > statements,
> > > > > or indicate all relevant ontologies by way of owl:includes.
> > > > > However, neither solution is viable for RDF-only 
> cases, such as
> > > > > querying the aforementioned simple spider agent.
> > > >
> > > > Or, if the recieving agent has no knowledge about those 
> properties,
> > > > it can either submit a URIQA MGET request, or ask the 
> same source
> > > > of that knowledge for additional knowledge about those 
> properties.
> > > > I.e., ask the sending agent what it knows about those properties
> > > > (by sending the CBD of each, etc.)
> > > >
> > > > I see the definition of CBDs as a componenent of a general
> > > > bootstrapping mechanism for the semantic web, not as an all
> > > > encompassing solution to knowledge interchange.
> > > >
> > > Yes, and I dont expect them to be anything else. But I 
> was thinking of
> > > a scenario where the receiving agent has limited 
> resources (memory,
> > > bandwidth, CPU power), for example because it resides on a
> > > tiny embedded
> > > device. For that reason it cannot cache all ontologies it
> > > might encounter,
> > > or ask for the precise definition of everything it finds. 
> But it can
> > > answer simple queries on the triple level, like "find me things of
> > > rdf:type dev:Printer with foo:location bar:Room5". The 
> lookup service
> > > has some _:p in its database that matches these criteria, 
> but also has
> > > the IFP comp:uniqueDeviceNumber. The agent does not know the comp
> > > ontology, so it does not know the information about _:p 
> is pruned, and
> > > would match if another query were formed. (Sorry for this
> > > hasty example)
> >
> > I certainly am sympathetic to agents running in limited environments
> > (after all, I'm a semantic web researcher working for Nokia ;-)
> > but again, my experience has been that applications that deal with
> > knowledge expressed with ontologies employing IFPs are 
> aware of which
> > IFPs are important -- and even more so, are highly selective of
> > knowledge which syncs with their own limited vocabularies 
> and disregard
> > the rest (so if the embedded agent doesn't already know it's an IFP,
> > it doesn't care and won't bother about it anyway).
> >
> > Again, I appreciate your point, but would like to see some hard
> > and real use cases and experience demonstrate the need, rather
> > than expanding the scope of CBDs merely "on a hunch" or as a matter
> > of esthetics or "just in case" arguments.
> >
> > >
> > > > > By the way this seems to be a more general case of the
> > > "crossing layer
> > > > > boundaries"-problem previously discussed (but not solved)
> > > in another
> > > > > mailing list thread [1].
> > > >
> > > > Well, given the examples you present in that referenced 
> document,
> > > > I would say that those "missing triples" are provided for by the
> > > > closure rules defined in the RDF model theory. Triples 
> that can be
> > > > inferred are not the same as triples which are simply 
> not included
> > > > in a graph, but must be obtained separately.
> > > >
> > > Well, I think the cases are similar in that there the
> > > receiver is supposed
> > > to understand rdfs:subClassOf etc, where here it is supposed
> > > to understand
> > > owl:InverseFunctionalProperty. If the receiver does not
> > > understand these,
> > > it will fail to process the information in the way the sender
> > > intended.
> >
> > That is a much broader issue and a general challenge for achieving
> > a critical mass of deployed semantic web solutions.
> >
> > I don't see that the definition of CBDs directly helps or hinders
> > that problem (though I see that URIQA most certainly would help
> > substantially).
> >
> > Thanks for the engaging questions. I hope you don't feel I'm in
> > any way blowing you off and not taking your points seriously. I'm
> > simply not convinced that there is a critical problem that would
> > be solved by changing the definition of CBDs as opposed to other
> > approaches/solutions.
> >
> > Cheers,
> >
> > Patrick
> >
>
Received on Saturday, 21 August 2004 07:50:46 UTC