RE: Revised draft of CBD from Patrick.Stickler@nokia.com on 2004-10-12 (www-rdf-interest@w3.org from October 2004)

From: <Patrick.Stickler@nokia.com>
Date: Tue, 12 Oct 2004 09:13:14 +0300
To: <otto@math.fu-berlin.de>
Cc: <eric@w3.org>, <pfps@research.bell-labs.com>, <www-rdf-interest@w3.org>
Message-ID: <1E4A0AC134884349A21955574A90A7A50ADD2C@trebe051.ntc.nokia.com>
> -----Original Message-----
> From: ext Karsten Otto [mailto:otto@math.fu-berlin.de]
> Sent: 11 October, 2004 17:12
> To: Stickler Patrick (Nokia-TP-MSW/Tampere)
> Cc: eric@w3.org; pfps@research.bell-labs.com; www-rdf-interest@w3.org
> Subject: RE: Revised draft of CBD
> 
> 
> On Sun, 10 Oct 2004 Patrick.Stickler@nokia.com wrote:
> >
> > [...big cut...]
> > Having CBDs as a default, but having both CBDs and SCBDs
> > defined in a standardized way so that broad support for both can
> > be encouraged and either can be easily requested, would be IMO
> > a very good thing.
> >
> Agreed, but see below.
> 
> >
> >> While this departs from the original CBD question of "tell me
> >> about this
> >> resource", I believe it to be a common enough case to 
> deserve its own
> >> "optimal alternative form". This Concise Bounded Usage
> >> Description (CBUD)
> >> can easily be dervied from the original CBD definition by 
> exchanging
> >> "subject" and "object", plus a minor modification regarding
> >> reifications:
> >>
> >> 1. Include in the subgraph all statements where the *object* of the
> >>     statement is the particular node in question;
> >> 2. Recursively, for all statements identified in the 
> subgraph thus far
> >>     having a blank node *subject*, include in the subgraph
> >> all statements
> >>     where the *object* of the statement is the blank node 
> in question
> >>     and which are not already included in the subgraph.
> >> 3. Recursively, for all statements included in the subgraph
> >> thus far, for
> >>     all reifications of each statement, include the *four RDF
> >> reification
> >>     statements* and the concise bounded *usage* description of the
> >>     rdf:Statement node of each reification.
> >>
> >> (Note that one could also construct symmetric and inverse 
> functional
> >> variants in a similar way when needed.)
> >
> > If I understand this correctly, a CBUD would be a subset of a SCBD,
> > right?
> >
> Not really. A SCBD is basically a CDB, additionally listing 
> the inbound
> arcs for nodes that have any. As you point out in the CDB 
> document, this
> is to a maximum depth of 1.
> 
> In contrast, a CBUD is a sort of reverse CDB: Where the CDB follows 
> outbound arcs, the CBUD does the same for inbound arcs (including
> reifications of these). Naturally this will result in depth greater 
> than 1.
> 
> > I don't suppose you could provide an example, per the node
> > http://example.com/aReallyGreatBook in the source
> > graph provided in the latest CBD document:
> >
> > http://swdev.nokia.com/uriqa/CBD.html#sourcegraph
> >
> > ???
> >
> 
> This source graph is not a good example; the result of the 
> CBUD algorithm 
> is indeed only a subset of the SCBD. However, to illustrate the
> difference, lets add the following to the source graph:
> 
> <!-- indirect reference via an anonymous node -->
> <rdf:Description rdf:about="http://example.com/anotherBookCritic">
>     <ex:rates>
>        <rdf:Description rdf:nodeID="A0">
>          <ex:thumbs>5</ex:thumbs>
>          <rdf:value 
> rdf:resource="http://example.com/aReallyGreatBook"/>
>        </rdf:Description>
>     </ex:rates>
> </rdf:Description>
> 
> <!-- reification of an inbound arc -->
> <rdf:Description rdf:about="http://example.com/aReviewMagazine">
>     <ex:covers>
>        <rdf:Statement>
>          <rdf:subject 
> rdf:resource="http://example.com/anotherBookCritic"/>
>          <rdf:predicate rdf:resource="http://example.com/rates"/>
>          <rdf:object rdf:nodeID="A0"/>
>        </rdf:Statement>
>     </ex:covers>
> </rdf:Description>
> 
> Now the CBUD is clearly different, as it also discovers the
> relationships introduced by the addition above:
> 
> <!-- found by CBUD and SCBD -->
> <rdf:Description rdf:about="http://example.com/anotherGreatBook">
>    <rdfs:seeAlso rdf:resource="http://example.com/aReallyGreatBook"/>
> </rdf:Description>
> <rdf:Description rdf:about="http://example.com/aBookCritic">
>    <ex:likes rdf:resource="http://example.com/aReallyGreatBook"/>
> </rdf:Description>
> 
> <!-- found by CBUD only -->
> <rdf:Description rdf:about="http://example.com/anotherBookCritic">
>    <ex:rates>
>      <rdf:Description rdf:nodeID="A0">
>        <rdf:value rdf:resource="http://example.com/aReallyGreatBook"/>
>      <rdf:Description>
>    </ex:rates>
> </rdf:Description>
> <rdf:Description rdf:about="http://example.com/aReviewMagazine">
>    <ex:covers>
>      <rdf:Statement>
>        <rdf:subject 
> rdf:resource="http://example.com/anotherBookCritic"/>
>        <rdf:predicate rdf:resource="http://example.com/rates"/>
>        <rdf:object rdf:nodeID="A0"/>
>      </rdf:Statement>
>    </ex:covers>
> </rdf:Description>
> 
> Note that in contast to the second part, SCBD would only include this
> rather unhelpful fact:
> 
> <rdf:Description rdf:nodeID="A0">
>    <rdf:value rdf:resource="http://example.com/aReallyGreatBook"/>
> </rdf:Description>
> 
> While technically correct, this "dangling arc" is only part of the
> real usage relationship.
> 
> >
> >> I find this definition useful for a number of reasons:
> >> [...because it finds things like the extra information above...]
> >
> > Fair enough. Though note that this would make a CBUD incompatible
> > with the URIQA interface (the same for IFCBDs) and thus, while
> > offering clear utility, impose greater implementational requirements
> > and communication overhead than either CBDs or SCBDs, so if a CBUD
> > is a subset of a SCBD, would a SCBD suffice?
> >
> Yes, support for CBUD and other "alternative forms" would require
> yet another HTTP header or method. But as you pointed out,
> "what is this resource?" and "who uses it?" are two distinct
> questions; IMHO that should be reflected by the query infrastructure.
> Btw, do you have anything planned for URIQA in this direction?
> 
> > And if a CBUD is not a subset of a SCBD, per the present definition
> > of an SCBD, it may be reasonable to modify the definition of an SCBD
> > accordingly to address your use case while still allowing effective
> > use of the minimal URIQA interface.
> >
> I believe it should be possible to modify the SCBD definition to
> avoid "dangling arcs". 

OK. I think I finally have a grasp of what a CBUD is. It's basically
performing the CBD extraction "downwards", so that all object nodes
are either URIrefs, literals, or bnodes not acting as the subject
of any statement in the source graph; and then doing essentially
the same thing "upwards", so that all subject nodes are URIrefs.

Fair enough. And that may arguably be a better definition, and more
useful form, for a SCBD. Or we could define both and differentiate 
between them in terms of whether the upward "symmetrical" portion
is partial or full. E.g.

PSCBD "Partially Symmetric CBD" (present def of SCBD)
FSCBD "Fully Symmetric CBD" (CBUD)

Different applications will prefer one over the other. In the
case of a PSCBD, the application is mostly concerned with the
predicate of statements where the objects occur, so 1-level
deep is OK, and bnode subjects for those in-arc statements are
OK. In the case of a FSCBD, the application is interested in
directly related resources, and their descriptions, as well as
the resource denoted by the starting node.

Both of these are likely to be very useful to particular kinds
of applications, and having them defined in a standardized manner
is a good thing.

That said, insofar as the CBD document is concerned, I don't
plan for that to become a clearing house of definitions of
various forms of descriptions -- as it is meant to reflect
what Nokia has found to be particularly useful, not to 
speculate about what may also be useful in other application
areas. Even the section defining SCBDs and IFCBDs is hard
to fully justify on those grounds, but I will retain it as
it is useful to illustrate the point that CBDs are not presumed
to be the only useful form of description (even if, possibly,
the most generally useful for the broadest range of applications).

How or where various commonly used forms of description could
be documented and presented as a whole is an open question.

I would love to see either the DA WG or the SW BP WG produce
a non-normative advisory document along those lines, but
something less formal, done as a collaboration of interested
parties, would be good too.

> However, I feel this would blur the description
> of the resource that was asked for: Many of the information related
> to inbound arc chains are actually descriptions of other resources.

Well, this is going to be the case, in varying degrees, for any
description containing more information than contained in a CBD.

> Also, the size of the returned graph could become unwieldy if the
> source graph is very tangled and/or contains a large number 
> of anonymous 
> nodes (FOAF?).

True. And it's good to note that this is an issue with many (if not
most) forms of description, not just CBDs.

The extensive (over)use of bnodes to denote "significant" resources
leads to many scalability and access issues. This is not a shortcoming
of vocabularies such as FOAF, but a somewhat disjunct issue about
employing a best practice of naming significant resources.

Unfortunately, the practice of "pointing with ones elbow" rather
than referring by name (i.e. relying on inverse functional properties
to unambiguously refer to a resource rather than using a URI) 
introduces inefficiencies in communication. It may require less 
effort to grunt and gesture than to clearly articulate, and in
some contexts that may be sufficient, but when communication between
arbitrarry partners is expected, using explicit names is best;
even the names one uses are not identical with names others use,
and one has to note which ones are owl:sameAs some others.

If one is going to talk about such significant resources as people, 
organizations, services, etc. then *name* those resources with
URIs. Yes, also use inverse functional properties so that we can
figure out where there is co-denotation, but still use URIs so
that communication between arbitrary agents can be more efficient.

Cheers,

Patrick
Received on Tuesday, 12 October 2004 06:15:00 UTC