RE: problems with concise bounded descriptions from Patrick.Stickler@nokia.com on 2004-10-01 (www-rdf-interest@w3.org from October 2004)

From: <Patrick.Stickler@nokia.com>
Date: Fri, 1 Oct 2004 12:55:42 +0300
To: <eric@w3.org>
Cc: <danbri@w3.org>, <pfps@research.bell-labs.com>, <www-rdf-interest@w3.org>
Message-ID: <1E4A0AC134884349A21955574A90A7A50ADCFB@trebe051.ntc.nokia.com>
> -----Original Message-----
> From: ext Eric Prud'hommeaux [mailto:eric@w3.org]
> Sent: 01 October, 2004 11:25
> To: Stickler Patrick (Nokia-TP-MSW/Tampere)
> Cc: Dan Brickley; pfps@research.bell-labs.com; www-rdf-interest@w3.org
> Subject: Re: problems with concise bounded descriptions
> 
> 
> On Fri, Oct 01, 2004 at 09:28:31AM +0300, 
> Patrick.Stickler@nokia.com wrote:
> > 
> > 
> > > -----Original Message-----
> > > From: www-rdf-interest-request@w3.org
> > > [mailto:www-rdf-interest-request@w3.org]On Behalf Of ext Eric
> > > Prud'hommeaux
> > > Sent: 01 October, 2004 05:18
> > > To: Peter F. Patel-Schneider
> > > Cc: www-rdf-interest@w3.org
> > > Subject: Re: problems with concise bounded descriptions
> > > 
> > > 
> > > Not being the author, I will address your points to the best of my
> > > understanding. 
> > 
> > I'm working on a posting to address the issues raised in your
> > comments to the submission.
> > 
> > I'm also working on my own response to Peter's comments, and will
> > at the same time digest what you've responded here. But shortly, a
> > comment about the following:
> > 
> > > The main problem *I* see with CBDs is that they favor a particular
> > > expression of data, i.e. arcs-out rather than arcs-in. 
> This could bias
> > > developers as they may wish to make sure that their data is
> > > expressible in a CBD even at some cost to clarity. 
> 
> Just to clarify my point, I'm talking about a human-engineering issue:
> Would the proliferation of CBDs encourage developers of vocabularies
> to take advantage of them even though it incurs some cost to modeling
> consistency or clarity?
> 
> I don't think it can be answered by a technical argument, but instead
> has to be cast in terms of benefit vs. cost.

I can't say that I fully grasp the point you are trying to
make here, though a glimmer of it seems to be coming through.

As for utility/cost/etc., the CBD submission is simply Nokia sharing
with others what we have found to work well, be very useful, and
likely to benefit others as well.

We do not assert that it is perfect, either for any particular
application,
or for even for a majority of applications. 

So I don't intend to get into any long drawn out debates that
split hairs about particular points motivated by use cases that
I cannot directly relate to or do not fully understand.

If others (including the DA WG) find CBDs useful, Great. 
If CBDs can be made better, Great.
If the specification can be made clearer, Great.

If CBDs do not meet anyone's particular needs or preferences in 
any way, then just ignore our submission and do your own thing.

> > Firstly, CBDs are resource-centric, and meant to be the 
> most concise,
> > smallest body of knowledge about a particular named resource that
> > an agent can obtain per a single request, based on the URI denoting
> > that resource.
> > 
> > Secondly, CBDs are not intended as a replacement/alternate to a more
> > general query solution.
> 
> Ahh, but will they, to whatever degree?

In applications that do not *need* to use a more comprehensive
query facility, why would you *want* to implement a more
comprehensive query facility.

You know...  use cases -> requirements -> design -> deployment

CBDs are not a theoretical excercise. They meet a specific need. 

> > Thirdly, CBDs are not intended to be the only possible form 
> of response
> > to a question "tell me about this thing".
> > 
> > Fourthly, CBDs identify a subset of a graph, and I honestly can't
> > imagine
> > how that would constrain or influence how a given developer would
> > express
> > knowledge about resources, since even if CBDs are provided by some
> > service,
> > there will likely be other means of access to that 
> information. So I'd
> > need to see some pretty explicit and motivating use cases 
> before I'm 
> > convinced that your main problem with CBDs is a real issue.
> 
> Dan Brickely has some experience here. He said that RSS authors asked
> to have foaf:depiction reversed and called foaf:depicts so that it
> would be easier to express as a tree in RDF/XML. The same with
> foaf:made and foaf:maker. 

Sorry, I'm still not fully seeing how this relates to the utility
of CBDs. Perhaps DanBri might have some more detailed use cases
and examples or something that might help me.

> Maybe the influence of CBDs will be
> different than that of the convenient expression of RDFXML, but the
> parallel is clear.

Well, I think I'm following your general gist, that if CBDs prove
extremely useful and are widely used (and it won't surprise me
when they are) that folks will optimize their ontologies and
knowledge bases to make them even more useful -- but I still fail
to see how that would be "a bad thing".

I expect there will always be countless use cases which will 
ultimately be best addressed by a fully general query facility
(we have many such use cases ourselves) but even with a general
query facility (such as the DA WG is working on) the ability to
ask for, and recieve, a CBD will still have widespread utility.

I never have seen these two as being at odds, but as complementary.

And from the perspective of implementational burden, if 9 out of 10
of your applications get along just fine with interchanging CBDs
where queries consist solely of individual URIs, why would you bother
implementing a more comprehensive query solution for those 9
applications
that don't need it? And even if it were a matter of just dropping in
the API, why have the code bloat?

(remember, tens of thousands of needless lines of code are a non-trivial

issue on mobile platforms ;-)

KISS...

> > Finally, while there can be some application areas where "arcs-in"
> > information is useful/necessary, in many applications, it can result
> > in a huge number of statements in a graph. As an extreme 
> case, consider
> > a request for a CBD for rdf:Resource where inference is enabled...
> > Again, CBDs are not intended to replace a general query facility, or
> > some other form of "resource view" which would accomodate retrieval
> > of "arcs-in" knowledge.
> 
> Points 1 and 4 have a heuristic support from the way people have
> tended to model data, but they do make an assumption about how it's
> organized. 

And many ontologies explicilty facilitate the definition of taxonomies, 
because that is how people commonly organize their world. People don't
employ taxonomies because the ontologies suggest they do. You seem
to be suggesting that CBDs are in some way the tail wagging the
dog, or in some other way contraining/limiting how people will want
to model and interchange knowledge, and I'm sorry, but I just don't 
think you're succeeding in demonstrating such a point.

>  Serialization and model constraints (no literals as
> subjects) do encourage this model.

I'm not sure how allowing literals as subjects would make the
definition of CBDs much different -- since, given the tidyness
of literals, they would behave similarly to URIref subjects in
a graph, merging in like manner and having equivalent denotation. 
I really don't see how literals not being allowed to be subjects, 
or allowed to be subjects really makes much difference to CBDs
in any fundamental way.

If literals are ever allowed to be subjects, the CBD definition
could easily, and trivially, be adjusted to allow for them, with
no significant functional issues.

Perhaps you see a problem that I am just missing. An example
would probably help.

Patrick


> > > while there are some applications 
> > > I think the recipe
> > > also needs some text to deal with cyclic graphs of 
> bNodes, but that's
> > > a minor point.
> > 
> > Agreed. And thanks for pointing that issue out. I also agree
> > that it's a minor issue and fixed with a single check in the
> > algoritm to avoid infinite loops.
> > 
> > (the present implementation is expressed as inference rules, not
> > as a linear set of steps, so this problem does not arise, hence
> > it being overlooked).
> > 
> > Cheers,
> > 
> > Patrick
> > 
> > > On Thu, Sep 30, 2004 at 07:39:32PM -0400, Peter F. 
> > > Patel-Schneider wrote:
> > > > 
> > > > In the DAWG message archive I came across a reference to a 
> > > W3C member
> > > > submission from Nokia on Concise Bounded Descriptions
> > > > http://www.w3.org/Submission/CBD/.
> > > > 
> > > > The notion of Concise Bounded Descriptions (CBD) in this 
> > > note has a number
> > > > of problems.
> > > > 
> > > > The initial description of a CBD is severely 
> > > underspecified.  According to
> > > > the note, ``A [CBD] of a resource is a body of 
> knowledge about that
> > > > resource which does not include any explicit knowledge 
> > > about any other
> > > > resource which can be obtained separately from the same 
> source.''
> > > > 
> > > > Problem 1:  Which source?
> > > 
> > > The query service.
> > > 
> > > > Problem 2:  What is ``explicit'' knowledge?
> > > 
> > > I'm not sure I would have chosen ``explicit'', but I 
> believe this is
> > > the set of arcs-out from a resource which is reached in a CBD
> > > traversal. All arcs-out from R1 are included in the CBD. 
> If that graph
> > > involves R2 (and R2 isn't a literal or bNode), the client can ask
> > > about R2 in a separate request. Thus, arcs-out from R2 are not
> > > included in R1's CBD.
> > > 
> > > Perhaps ``minutiae'' would be better?
> > > 
> > > > Problem 3:  What is ``obtain separately''?
> > > 
> > > Subsequent query.
> > > 
> > > > Problem 4:  A function that always returns nothing 
> satisfies this
> > > > description, as it certainly does not include any knowledge 
> > > (explicit or
> > > > not) that be obtained (separately or not) from the same 
> > > source (or indeed
> > > > any source at all).
> > > 
> > > Yes, but it is not compiant with the recipe in the
> > > specification. Perhaps the description could be amended to make it
> > > more clear, but I wouldn't expect it to stand on it's own as the
> > > definition.
> > > 
> > > > The definition of CBD in terms of a procedure on RDF 
> graphs also has
> > > > serious problems.
> > > > 
> > > > Problem 5:  Given a node in an RDF graph, there is no 
> general way of
> > > > determining which nodes in the graph are co-denotational 
> > > with that node.
> > > > Consider, for example, the RDF graph:
> > > > 	_:a ex:b _:c .
> > > > 	_:d ex:e _:f .
> > > > What is the CBD of _:a in this graph?
> > > 
> > > Being a pragmatist (for which I recieve the occasional 
> slap), I would
> > > say we are responding with a CBD of what we *do* know 
> about _:a, and
> > > thusly return only the first arc. If we later learn that 
> _:a and _:d
> > > are the same arc, and the client queris again, they get 
> more arcs, but
> > > nothing contradictory.
> > > 
> > > > Problem 6:  This definition does not satisfy the initial 
> > > description of a
> > > > CBD.  Consider, for example, the RDF graph:
> > > > 	ex:a ex:b ex:c .
> > > > 	ex:r rdf:type rdf:Statement .
> > > > 	ex:r rdf:subject ex:a .
> > > > 	ex:r rdf:predicate ex:b .
> > > > 	ex:r rdf:object ex:c .
> > > > the CBD of ex:a in this graph is the graph itself, but it 
> > > includes explicit
> > > > information about ex:r, a potentially different resource.
> > > 
> > > I haven't really explored CBDs of reifications. Patrick, 
> do you have
> > > any fun use cases for this? Regardless, Peter, do you have any
> > > suggested words for Patrick to include the reification arcs in the
> > > initial description?
> > > 
> > > > Problem 7:  This definition does not provide enough 
> information to
> > > > distinguish the node from other distinguishable nodes 
> in the graph.
> > > > Consider, for example, the RDF graph: 
> > > > 	ex:r rdf:type owl:InverseFunctionalProperty .
> > > > 	_:a ex:r _:b .
> > > > 	_:b ex:r _:a .
> > > > 	_:a ex:s "NODE A" .
> > > > 	_:b ex:s "NODE B" .
> > > > Then the CBD of _:a in this graph is
> > > > 	_:x1 ex:r _:x2 .
> > > > 	_:x2 ex:r _:x1 .
> > > > which is the same as the CBD of _:b in this graph but _:a 
> > > and _:b are
> > > > distinguishable in the graph and thus should have 
> different CBDs.
> > > 
> > > Yeah, but nothing else sovles that either. They're 
> ambiguous to the
> > > server and they're ambiguous to the client. The only 
> additional info
> > > that the server has is that there exists in the domain of 
> discourse
> > > another bNode. I don't think it's worth telling the 
> client about it.
> > > 
> > > > (Definition: Two blank nodes, n1 and n2, are 
> > > indistinguishable in a graph G
> > > > if G with n1 mapped to n2 and n2 mapped to n1 is 
> > > graph-equal to G (i.e.,
> > > > thes sets of triples are exactly the same).  Any node is 
> > > indistinguishable
> > > > from itself.  Two literal nodes are indistinguishable if 
> > > they mean the same
> > > > literal value.  All other pairs of nodes are distinguishable.)
> > > -- 
> > > -eric
> > > 
> > > office: +81.466.49.1170 W3C, Keio Research Institute at SFC,
> > >                         Shonan Fujisawa Campus, Keio University,
> > >                         5322 Endo, Fujisawa, Kanagawa 252-8520
> > >                         JAPAN
> > >         +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA
> > > cell:   +1.857.222.5741 (does not work in Asia)
> > > 
> > > (eric@w3.org)
> > > Feel free to forward this message to any list for any purpose 
> > > other than
> > > email address distribution.
> > > 
> 
> -- 
> -eric
> 
> office: +81.466.49.1170 W3C, Keio Research Institute at SFC,
>                         Shonan Fujisawa Campus, Keio University,
>                         5322 Endo, Fujisawa, Kanagawa 252-8520
>                         JAPAN
>         +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA
> cell:   +1.857.222.5741 (does not work in Asia)
> 
> (eric@w3.org)
> Feel free to forward this message to any list for any purpose 
> other than
> email address distribution.
>
Received on Friday, 1 October 2004 09:56:15 UTC