Re: Distributed querying on the semantic web from Patrick Stickler on 2004-04-22 (www-rdf-interest@w3.org from April 2004)

From: Patrick Stickler <patrick.stickler@nokia.com>
Date: Thu, 22 Apr 2004 10:35:34 +0300
To: "ext Phil Dawes" <pdawes@users.sourceforge.net>
Cc: www-rdf-interest@w3.org
Message-Id: <A454A164-942F-11D8-AB55-000A95EAFCEA@nokia.com>
On Apr 19, 2004, at 14:48, ext Phil Dawes wrote:

> The main problem with Patrick's concise-bounded-description idea from
> this respect is how to find references to a term.
>
> For example:
>
> (p:PhilDawes, foaf:knows, ?person)
>
> ..is easy to resolve - just dereference p:PhilDawes and you probably
> have the information you need. (I'm using dereference to mean 'look up
> a description').
>
> However
>
> (?person, foaf:knows, p:PhilDawes)
>
> .is much more tricky, since these assertions are likely to be made by
> users external to the domain owner of p:PhilDawes.

This has nothing to do with the definition of concise bounded 
descriptions
(which simply define a particular kind of RDF sub-graph focused on a
particular resource).

URIQA does not attempt to provide a general query language/solution, but
is expected to work in harmony with any number of general query 
solutions
(such as is the focus of the RDF Data Access Working Group).

Thus, there is no "problem" with not providing functionality that was
never intended to be provided.

>
>
> Here's a straw-man solution:
>
> - In addition to its bounded description, dereferencing p:PhilDawes
> also provides all the references it knows about.
>
> - When people author statements refering to p:PhilDawes, they POST
> their triples to the description of p:PhilDawes. (Or maybe a third
> party does).
>
> - The representation of p:PhilDawes polls the reference URIs it knows
> about periodically to keep its data up to date. (facilitating the
> removal of triples as well as addition)
>
>
> This would work, but the immediate problem for this solution is
> scalability. In particular, domain owners of terms for common concepts
> will require massive amounts of storage and bandwidth to maintain
> their lists of references.

I think that there is, and should be, a clear distinction between
authoritative sources of knowledge and registries/respositories of
general knowledge (which harvest knowledge from authoritative sources).

I.e., consider a SW variant of Google which crawls the SW, syndicating
authoritative descriptions of resources from web-accessible sources.
Then, one can ask that centralized repository about e.g. all the folks
who know p:PhilDawes.

What you seem to be proposing is that all the authoritative sources
collaboratively update/mirror any related information amoungst 
themselves,
which is (and as you point out) a scalability nightmare.

Better to follow the current, proven approach to letting folks publish
what the like, and then let specialized harvesters pick and choose
from that what is relevant to their specialized registries, and allow
customers to interface with the specialized registries.

Doing this using SW techniques simply makes things work that much better
and the results be that much more precise and useful.

>
>
> I was considering a couple of solutions:
>
> - delegation of reference maintainers.
>
>      The owner of the term delegates its references to a third party,
>      more able to manage the storage and publication. The description
>      it returns on dereference contains rdfs:seeAlso terms pointing to
>      the reference maintainer.

Or just let harvesters crawl the metadata descriptions accessible at
your site and syndicate them into specialized registries.

I.e. the owner of a term just worries/manages what they care about
and others can use the term or make (non-authoritative) statements
about the thing denoted by the term as they like on their *own* site,
and eventually/ultimately, when all those statements get syndicated
into some specialized centralized repository (e.g. sw.google.com)
then folks can query a broader scope of information than any
individual source might provide (just as present Google users query
a much broader range of content than any individual web site
provides).


Patrick



> - usage of owl to split the domains
>
>      For example, terms like q:DoctorInMoseley could be used to
>      aggregate a:Doctor and b:Moseley to split the load.  The a:Doctor
>      description would then include a reference to the
> 	  q:DoctorInMoseley rdfs:subClassOf a:Doctor term.
>      and b:Moseley could contain
>       q:DoctorInMoseley rdfs:subClassOf [a owl:Restriction;
>                                          owl:onProperty z:servesArea;
>                                          owl:hasValue b:Moseley].
>      statements.
>
>
>
> I am keen to hear any ideas that others may have on the subject since
> in addition to helping bootstrap the semantic web, this is a facility
> that would be very beneficial in my work intranet environment.
>
> Cheers,
>
> Phil
>
>
>

--

Patrick Stickler
Nokia, Finland
patrick.stickler@nokia.com
Received on Thursday, 22 April 2004 03:36:12 UTC