Re: problems with concise bounded descriptions from Eric Prud'hommeaux on 2004-10-01 (www-rdf-interest@w3.org from October 2004)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Fri, 1 Oct 2004 04:25:17 -0400
To: Patrick.Stickler@nokia.com
Cc: Dan Brickley <danbri@w3.org>, pfps@research.bell-labs.com, www-rdf-interest@w3.org
Message-ID: <20041001082231.GW20897@w3.org>
On Fri, Oct 01, 2004 at 09:28:31AM +0300, Patrick.Stickler@nokia.com wrote:
> 
> 
> > -----Original Message-----
> > From: www-rdf-interest-request@w3.org
> > [mailto:www-rdf-interest-request@w3.org]On Behalf Of ext Eric
> > Prud'hommeaux
> > Sent: 01 October, 2004 05:18
> > To: Peter F. Patel-Schneider
> > Cc: www-rdf-interest@w3.org
> > Subject: Re: problems with concise bounded descriptions
> > 
> > 
> > Not being the author, I will address your points to the best of my
> > understanding. 
> 
> I'm working on a posting to address the issues raised in your
> comments to the submission.
> 
> I'm also working on my own response to Peter's comments, and will
> at the same time digest what you've responded here. But shortly, a
> comment about the following:
> 
> > The main problem *I* see with CBDs is that they favor a particular
> > expression of data, i.e. arcs-out rather than arcs-in. This could bias
> > developers as they may wish to make sure that their data is
> > expressible in a CBD even at some cost to clarity. 

Just to clarify my point, I'm talking about a human-engineering issue:
Would the proliferation of CBDs encourage developers of vocabularies
to take advantage of them even though it incurs some cost to modeling
consistency or clarity?

I don't think it can be answered by a technical argument, but instead
has to be cast in terms of benefit vs. cost.

> Firstly, CBDs are resource-centric, and meant to be the most concise,
> smallest body of knowledge about a particular named resource that
> an agent can obtain per a single request, based on the URI denoting
> that resource.
> 
> Secondly, CBDs are not intended as a replacement/alternate to a more
> general query solution.

Ahh, but will they, to whatever degree?

> Thirdly, CBDs are not intended to be the only possible form of response
> to a question "tell me about this thing".
> 
> Fourthly, CBDs identify a subset of a graph, and I honestly can't
> imagine
> how that would constrain or influence how a given developer would
> express
> knowledge about resources, since even if CBDs are provided by some
> service,
> there will likely be other means of access to that information. So I'd
> need to see some pretty explicit and motivating use cases before I'm 
> convinced that your main problem with CBDs is a real issue.

Dan Brickely has some experience here. He said that RSS authors asked
to have foaf:depiction reversed and called foaf:depicts so that it
would be easier to express as a tree in RDF/XML. The same with
foaf:made and foaf:maker. Maybe the influence of CBDs will be
different than that of the convenient expression of RDFXML, but the
parallel is clear.

> Finally, while there can be some application areas where "arcs-in"
> information is useful/necessary, in many applications, it can result
> in a huge number of statements in a graph. As an extreme case, consider
> a request for a CBD for rdf:Resource where inference is enabled...
> Again, CBDs are not intended to replace a general query facility, or
> some other form of "resource view" which would accomodate retrieval
> of "arcs-in" knowledge.

Points 1 and 4 have a heuristic support from the way people have
tended to model data, but they do make an assumption about how it's
organized. Serialization and model constraints (no literals as
subjects) do encourage this model.

> > while there are some applications 
> > I think the recipe
> > also needs some text to deal with cyclic graphs of bNodes, but that's
> > a minor point.
> 
> Agreed. And thanks for pointing that issue out. I also agree
> that it's a minor issue and fixed with a single check in the
> algoritm to avoid infinite loops.
> 
> (the present implementation is expressed as inference rules, not
> as a linear set of steps, so this problem does not arise, hence
> it being overlooked).
> 
> Cheers,
> 
> Patrick
> 
> > On Thu, Sep 30, 2004 at 07:39:32PM -0400, Peter F. 
> > Patel-Schneider wrote:
> > > 
> > > In the DAWG message archive I came across a reference to a 
> > W3C member
> > > submission from Nokia on Concise Bounded Descriptions
> > > http://www.w3.org/Submission/CBD/.
> > > 
> > > The notion of Concise Bounded Descriptions (CBD) in this 
> > note has a number
> > > of problems.
> > > 
> > > The initial description of a CBD is severely 
> > underspecified.  According to
> > > the note, ``A [CBD] of a resource is a body of knowledge about that
> > > resource which does not include any explicit knowledge 
> > about any other
> > > resource which can be obtained separately from the same source.''
> > > 
> > > Problem 1:  Which source?
> > 
> > The query service.
> > 
> > > Problem 2:  What is ``explicit'' knowledge?
> > 
> > I'm not sure I would have chosen ``explicit'', but I believe this is
> > the set of arcs-out from a resource which is reached in a CBD
> > traversal. All arcs-out from R1 are included in the CBD. If that graph
> > involves R2 (and R2 isn't a literal or bNode), the client can ask
> > about R2 in a separate request. Thus, arcs-out from R2 are not
> > included in R1's CBD.
> > 
> > Perhaps ``minutiae'' would be better?
> > 
> > > Problem 3:  What is ``obtain separately''?
> > 
> > Subsequent query.
> > 
> > > Problem 4:  A function that always returns nothing satisfies this
> > > description, as it certainly does not include any knowledge 
> > (explicit or
> > > not) that be obtained (separately or not) from the same 
> > source (or indeed
> > > any source at all).
> > 
> > Yes, but it is not compiant with the recipe in the
> > specification. Perhaps the description could be amended to make it
> > more clear, but I wouldn't expect it to stand on it's own as the
> > definition.
> > 
> > > The definition of CBD in terms of a procedure on RDF graphs also has
> > > serious problems.
> > > 
> > > Problem 5:  Given a node in an RDF graph, there is no general way of
> > > determining which nodes in the graph are co-denotational 
> > with that node.
> > > Consider, for example, the RDF graph:
> > > 	_:a ex:b _:c .
> > > 	_:d ex:e _:f .
> > > What is the CBD of _:a in this graph?
> > 
> > Being a pragmatist (for which I recieve the occasional slap), I would
> > say we are responding with a CBD of what we *do* know about _:a, and
> > thusly return only the first arc. If we later learn that _:a and _:d
> > are the same arc, and the client queris again, they get more arcs, but
> > nothing contradictory.
> > 
> > > Problem 6:  This definition does not satisfy the initial 
> > description of a
> > > CBD.  Consider, for example, the RDF graph:
> > > 	ex:a ex:b ex:c .
> > > 	ex:r rdf:type rdf:Statement .
> > > 	ex:r rdf:subject ex:a .
> > > 	ex:r rdf:predicate ex:b .
> > > 	ex:r rdf:object ex:c .
> > > the CBD of ex:a in this graph is the graph itself, but it 
> > includes explicit
> > > information about ex:r, a potentially different resource.
> > 
> > I haven't really explored CBDs of reifications. Patrick, do you have
> > any fun use cases for this? Regardless, Peter, do you have any
> > suggested words for Patrick to include the reification arcs in the
> > initial description?
> > 
> > > Problem 7:  This definition does not provide enough information to
> > > distinguish the node from other distinguishable nodes in the graph.
> > > Consider, for example, the RDF graph: 
> > > 	ex:r rdf:type owl:InverseFunctionalProperty .
> > > 	_:a ex:r _:b .
> > > 	_:b ex:r _:a .
> > > 	_:a ex:s "NODE A" .
> > > 	_:b ex:s "NODE B" .
> > > Then the CBD of _:a in this graph is
> > > 	_:x1 ex:r _:x2 .
> > > 	_:x2 ex:r _:x1 .
> > > which is the same as the CBD of _:b in this graph but _:a 
> > and _:b are
> > > distinguishable in the graph and thus should have different CBDs.
> > 
> > Yeah, but nothing else sovles that either. They're ambiguous to the
> > server and they're ambiguous to the client. The only additional info
> > that the server has is that there exists in the domain of discourse
> > another bNode. I don't think it's worth telling the client about it.
> > 
> > > (Definition: Two blank nodes, n1 and n2, are 
> > indistinguishable in a graph G
> > > if G with n1 mapped to n2 and n2 mapped to n1 is 
> > graph-equal to G (i.e.,
> > > thes sets of triples are exactly the same).  Any node is 
> > indistinguishable
> > > from itself.  Two literal nodes are indistinguishable if 
> > they mean the same
> > > literal value.  All other pairs of nodes are distinguishable.)
> > -- 
> > -eric
> > 
> > office: +81.466.49.1170 W3C, Keio Research Institute at SFC,
> >                         Shonan Fujisawa Campus, Keio University,
> >                         5322 Endo, Fujisawa, Kanagawa 252-8520
> >                         JAPAN
> >         +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA
> > cell:   +1.857.222.5741 (does not work in Asia)
> > 
> > (eric@w3.org)
> > Feel free to forward this message to any list for any purpose 
> > other than
> > email address distribution.
> > 

-- 
-eric

office: +81.466.49.1170 W3C, Keio Research Institute at SFC,
                        Shonan Fujisawa Campus, Keio University,
                        5322 Endo, Fujisawa, Kanagawa 252-8520
                        JAPAN
        +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA
cell:   +1.857.222.5741 (does not work in Asia)

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.
Received on Friday, 1 October 2004 08:25:17 UTC