Re: problems with concise bounded descriptions from Eric Prud'hommeaux on 2004-10-01 (www-rdf-interest@w3.org from October 2004)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Thu, 30 Sep 2004 22:17:32 -0400
To: "Peter F. Patel-Schneider" <pfps@research.bell-labs.com>
Cc: www-rdf-interest@w3.org
Message-ID: <20041001021732.GB19027@w3.org>
Not being the author, I will address your points to the best of my
understanding. But I would also like to point out that something like
a CBD would allow information servers to respond to queries without
any specific understanding of the queried object.  Clients would be
able to expect a certain pattern from such queries.  The world would
be a little bit more communicative and predictable.

Client C1 wants to know about a resource R1. Server S1 has some graph
that involves that R1. S1 will respond with whatever it wants.
If it knows the type, it will typically respond with an
application-specfic graph, for instance, a graph that's particularly
suited to describing a foaf:Person. If it doesn't know how to or care
to tailor the response, it can send it's notion of a generically
helpful graph. The annotea server responds with a subject query, that
is, all the arcs coming from the queried node. Arcs out seemd to be
more useful than arcs in, and we never had a compelling reason to do
both. In this sense, the Annotea response is a cheaper but less
helpful form of a CBD. It worked for our purposes, but a CBD would be
more helpful in a bNode-laden graph.

A client programmer that expects either an application-specific
response or a CBD can more effectively use the returned data. Without
a convention, different services will respond with their own slant on
what's helpful. An ontologist may choose to respond with any type arcs
coming from R1, plus a cloud of ontology surrounding those types. This
information may be helpful for the Protoge user but not the foaf
crawler. Another app may choose to respond with arcs-in and arcs-out.
Given no convention, the client programmer must deal with all likely
responses. With a convention, he/she may expect soemthing at least as
rich as a CBD, and maybe more, if the server has a special
understanding of R1. This puts burden on the protoge user to ask a
special query to get the ontology cloud, but at least the clients
know what to reasonably expect and code for.

The main problem *I* see with CBDs is that they favor a particular
expression of data, i.e. arcs-out rather than arcs-in. This could bias
developers as they may wish to make sure that their data is
expressible in a CBD even at some cost to clarity. I think the recipe
also needs some text to deal with cyclic graphs of bNodes, but that's
a minor point.

On Thu, Sep 30, 2004 at 07:39:32PM -0400, Peter F. Patel-Schneider wrote:
> 
> In the DAWG message archive I came across a reference to a W3C member
> submission from Nokia on Concise Bounded Descriptions
> http://www.w3.org/Submission/CBD/.
> 
> The notion of Concise Bounded Descriptions (CBD) in this note has a number
> of problems.
> 
> The initial description of a CBD is severely underspecified.  According to
> the note, ``A [CBD] of a resource is a body of knowledge about that
> resource which does not include any explicit knowledge about any other
> resource which can be obtained separately from the same source.''
> 
> Problem 1:  Which source?

The query service.

> Problem 2:  What is ``explicit'' knowledge?

I'm not sure I would have chosen ``explicit'', but I believe this is
the set of arcs-out from a resource which is reached in a CBD
traversal. All arcs-out from R1 are included in the CBD. If that graph
involves R2 (and R2 isn't a literal or bNode), the client can ask
about R2 in a separate request. Thus, arcs-out from R2 are not
included in R1's CBD.

Perhaps ``minutiae'' would be better?

> Problem 3:  What is ``obtain separately''?

Subsequent query.

> Problem 4:  A function that always returns nothing satisfies this
> description, as it certainly does not include any knowledge (explicit or
> not) that be obtained (separately or not) from the same source (or indeed
> any source at all).

Yes, but it is not compiant with the recipe in the
specification. Perhaps the description could be amended to make it
more clear, but I wouldn't expect it to stand on it's own as the
definition.

> The definition of CBD in terms of a procedure on RDF graphs also has
> serious problems.
> 
> Problem 5:  Given a node in an RDF graph, there is no general way of
> determining which nodes in the graph are co-denotational with that node.
> Consider, for example, the RDF graph:
> 	_:a ex:b _:c .
> 	_:d ex:e _:f .
> What is the CBD of _:a in this graph?

Being a pragmatist (for which I recieve the occasional slap), I would
say we are responding with a CBD of what we *do* know about _:a, and
thusly return only the first arc. If we later learn that _:a and _:d
are the same arc, and the client queris again, they get more arcs, but
nothing contradictory.

> Problem 6:  This definition does not satisfy the initial description of a
> CBD.  Consider, for example, the RDF graph:
> 	ex:a ex:b ex:c .
> 	ex:r rdf:type rdf:Statement .
> 	ex:r rdf:subject ex:a .
> 	ex:r rdf:predicate ex:b .
> 	ex:r rdf:object ex:c .
> the CBD of ex:a in this graph is the graph itself, but it includes explicit
> information about ex:r, a potentially different resource.

I haven't really explored CBDs of reifications. Patrick, do you have
any fun use cases for this? Regardless, Peter, do you have any
suggested words for Patrick to include the reification arcs in the
initial description?

> Problem 7:  This definition does not provide enough information to
> distinguish the node from other distinguishable nodes in the graph.
> Consider, for example, the RDF graph: 
> 	ex:r rdf:type owl:InverseFunctionalProperty .
> 	_:a ex:r _:b .
> 	_:b ex:r _:a .
> 	_:a ex:s "NODE A" .
> 	_:b ex:s "NODE B" .
> Then the CBD of _:a in this graph is
> 	_:x1 ex:r _:x2 .
> 	_:x2 ex:r _:x1 .
> which is the same as the CBD of _:b in this graph but _:a and _:b are
> distinguishable in the graph and thus should have different CBDs.

Yeah, but nothing else sovles that either. They're ambiguous to the
server and they're ambiguous to the client. The only additional info
that the server has is that there exists in the domain of discourse
another bNode. I don't think it's worth telling the client about it.

> (Definition: Two blank nodes, n1 and n2, are indistinguishable in a graph G
> if G with n1 mapped to n2 and n2 mapped to n1 is graph-equal to G (i.e.,
> thes sets of triples are exactly the same).  Any node is indistinguishable
> from itself.  Two literal nodes are indistinguishable if they mean the same
> literal value.  All other pairs of nodes are distinguishable.)
-- 
-eric

office: +81.466.49.1170 W3C, Keio Research Institute at SFC,
                        Shonan Fujisawa Campus, Keio University,
                        5322 Endo, Fujisawa, Kanagawa 252-8520
                        JAPAN
        +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA
cell:   +1.857.222.5741 (does not work in Asia)

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.
Received on Friday, 1 October 2004 02:17:32 UTC