Re: Distributed querying on the semantic web

From: Patrick Stickler <patrick.stickler@nokia.com>
Subject: Re: Distributed querying on the semantic web
Date: Thu, 22 Apr 2004 11:31:03 +0300

> On Apr 20, 2004, at 18:45, ext Peter F. Patel-Schneider wrote:
> 
> >> Hi Peter,
> >>
> >> [My Ramblings snipped - see rest of thread for info]
> >>
> >> Peter F. Patel-Schneider writes:
> >>>
> >> [...]
> >>>
> >>> Well, yes, but I don't think that the scheme that you propose is 
> >>> workable
> >>> in general.  Why not, instead, use information from the document in 
> >>> which
> >>> the URI reference occured?  I would claim that this information is 
> >>> going to
> >>> be at least as appropriate as the information found by using your 
> >>> scheme.
> >>> (It may, indeed, be that the document in which the URI reference 
> >>> occurs
> >>> does point to the document that you would get to, perhaps by using an
> >>> owl:imports construct.  This is, to me, the usual way things would 
> >>> occur,
> >>> but I view it as extremely important to allow for other states of 
> >>> affairs.)
> >>>
> >>
> >> Unfortunately, most of the RDF I consume doesn't contain this
> >> contextual linkage information (or even appear in well formed
> >> documents). Take RSS1.0 feeds for example: If there's a term I don't
> >> know about, the RSS feed doesn't contain enough context information
> >> for my SW agent to get me a description of that term.
> >
> > Yes, this is a definite problem with some sources - they use terms
> > without providing information about their meaning.
> 
> ???
> 
> The term is denoted by a URI. The authoritative meaning of that term
> should be obtainable via that URI (e.g. by using a solution such as
> URIQA).
> 
> Each source which uses a term should not have to bundle along the
> definition of that term! 

Sure, but where does this imply a need to go to an authoritative place to
get this information?

> Nor should it be manditory that the source
> have to indicate how/where that term is fully defined by the owner
> of that term.

I more or less agree with this.  In the absence of any indication to the
contrary, it is generally a good idea to have some mechanism that provides
a base mechanism for picking up information about terms.  However, why
should there not be a mechanism (such as owl:imports) that can be used by
the creator of a document to indicate where other information of terms used
in the document, including information that could be deemed to provide
`definitions' for terms in the document, can be found?

> All that should matter is the URI. Period. That's all. Nothing more
> should be required for the agent to obtain the authoritative description
> of that term, if required.

Well, here is where I protest.  I believe that the Semantic Web would be
stifled if the notion that ``authoritative'' information is (only) to be
found at certain places.  

> There is *NOTHING* wrong with RSS 1.0 in this regard. There is no
> reason whatsoever why an RSS instance should indicate how the
> definitions of the terms used should be obtained.
> 
> If some client doesn't understand a term, there should be a standardized
> SW-optimized means for the client to obtain the term's definition (and
> IMO, that should be done using URIQA or something similar).
> 
> 
> > Such sources are
> > broken and, in my view, violate the vision of the Semantic Web.
> 
> Then it would appear that your vision of the SW has little
> intersection with more commonly held vision of the SW.

Well, I'm sorry if you think so.  I would hope that the commonly held
vision of the Semantic Web still has some notions related to common
communications mechanisms; a shared naming mechansim; formal models of
information; and some role for reasoning.  I would further hope that the
commonly held vision of the Semantic Web still has a place for non-orthodox
points of views.

> > How, then, to do something useful in these situations?  A scheme that
> > goes to a standard location (namely the document accessible from the
> > URI of the URI reference) is probably no worse than any other.
> > However, it should always be in mind that this scheme incorporates a
> > leap of faith: faith that the standard document has information about
> > the term; faith that the standard document has usefully-complete
> > information about the term; faith that the document using the term is
> > using it in a way compatible with the information in the standard
> > document.  Each of these can leaps of faith can be counter to reality
> > and, worse, they can be counter to reality in undetectable ways.
> 
> Precisely, which is why thinking in terms of "documents" and limiting
> one's search for information about a term to particular documents is
> non-scalable and fragile.

Huh?  Why?  I'm not suggesting replacing the notion of a document with
anything different.  I'm not suggesting limiting searches to particular
named documents.

> Just as there is no standards-imposed constraints on how representations
> are stored/managed internally by a web server which responds to a GET
> request for a given URI and returns a representation -- so too should 
> there
> be no standards-imposed (or in any other way imposed) constraints on how
> authoritative descriptions are stored/managed internally by a SW server
> which responds to an MGET (or similar) request and returns the 
> description.

Sure, why should anyone care how information is stored locally.  Perhaps
I'm somehow using ``document'' in a way different from you.  By
``document'' all I mean is the contents of a response to a GET (or MGET or
whatever) whether encoded in RDF/XML or otherwise.  

I do, however, attribute some pragmatic effect to document boundaries -
perhaps that is what you are objecting to.   At the current stage of the
Semantic Web I don't see any mechanism besides document boundaries for
delimiting supposedly-cohesive chunks of information.  

> Thus, whether that term definition is expressed in one or a dozen 
> places,
> whether it is stored in a physical RDF/XML instance or a database, 
> whether
> one or a hundred people are involved in its creation or management, all
> is irrelevant to the agent and should be rightly hidden from view. All
> the agent wants is the authoritative description -- no matter how it
> is defined/managed.

Well agents certainly want cohesive chunks of information, however they are
generated.  What other mechanisms besides document boundaries are available
for determining these boundaries at the current time, however?

> The SW needs a layer of opacity in the publication/access of resource
> descriptions just as the web provides a layer of opacity in the
> publication/access of representations.
> 
> RDF/XML and OWL "documents" simply get in the way, and are the wrong
> level of resolution to try to provide a scalable, global, and efficient
> infrastructure for the publication and interchange of resource 
> descriptions
> across the SW.

So what are you going to replace them by?  

[...]

> >> My experience has been that once you start writing SW applications,
> >> the notion of 'document' becomes clumsy and doesn't provide much
> >> value. For example, we have lots of RDF published in documents at
> >> work, but typically applications don't go to these documents to get
> >> this information - they query an RDF knowledge base (e.g. sesame)
> >> which sucks data in from these documents.
> >
> > But how then do you determine which information to use?  There has to
> > be some limit to the amount of information that use and I don't see
> > any method for so doing that does not ultimately depend on documents
> > (or other similar information sources such as databases).
> 
> Documents are simply the wrong mechanism, at the wrong architectural 
> layer
> to construct our "webs of trust". Named, signed graphs are IMO the 
> answer.
> 
> (Jeremy Carroll, Chris Bizer, Pat Hayes, and I are finishing up a paper
> on an approach to addressing this issue which should be web-visible 
> soon).

Yes, this might be a more-general mechanism than documents.  However, I
don't see why documents cannot be used for this purpose.  

> >> The problem is that if we don't do this soon, a number of centralized
> >> spike solutions will appear based on harvesting all the RDF in the
> >> world and putting it in one place (e.g. 'google marketplace').
> >
> > Well, maybe, but I don't see much utility to harvesting the
> > information in random (and very large) collections of documents and
> > unioning all this information into one information source.
> 
> Apart from a very few, if even ultimately only one highly ambitious
> service (such as Google) most collections of knowledge will probably
> be highly specialized (e.g. harvesting all wine related knowledge, or
> all knowledge about vintage golf clubs, etc.).

And even these collections are almost certainly going to be internally
inconsistent if unioned together.  A partial solution, of course, is for
the aggregator to not do the union, but instead to leave the information in
separate information sources.

> And most likely, such collections would not (necessarily) be collections
> of "documents" but collections of knowledge -- harvested via a 
> standardized
> interface which rightly hides the underlying mechanisms used to manage
> such knowledge.

Well, sure, hiding the underlying mechanism could be useful, just as it is
useful that Google hides how it does what it does.  Again, however, why
does this do anything that goes against a document model?

> >  I do,
> > however, see lots of utility in analyzing semantic information from
> > lots of documents and providing pointers back to those documents, 
> > suitably
> > organized.
> 
> Simply pointing back to documents is leaving all the real work for each
> agent -- to parse and extract from such documents the individual bits
> of information that are needed insofar as a particular term or resource
> is concerned.

> It's not the least bit efficient or scalable.

Well, sure, but I think that this is neither desirable nor necessary.  Why
bother to do this - just accept documents as cohesive information sources
without picking a choosing?

> Consider a mobile client that needs to understand the meaning
> of some property. The "document" that defines this is a monolithic
> RDF/XML instance for an ontology defining 750 terms with labels
> in descriptions in 17 languages. It is 2.4 MB in size.
> 
> What a fat lot of help getting the URI of that massive RDF/XML
> "document" is going to be when all that is needed is a concise
> description of a single property.

Well, it may be that this massive RDF/XML is indeed necessary to do
anything useful with the term.  How is any automated system going to know?

> What the mobile client *should* be able to do, is to ask the web
> authority of the URI denoting that property for a concise bounded
> description of that property, 

(One question I have here is how it would be possible to determine the
concise bounded description of a property (or other URI reference) without
recourse to some human-provided partitioning of information, which could
easily be manifest in documents?)

> and then proceed with whatever it
> was doing -- with no concern for how that knowledge was managed,
> stored, partitioned, etc. etc.

What the client could do, yes, is to inquire at some WWW address for
useful information concerning a URI reference.  If what comes back is a
2.4MB document, then so be it.  If what comes back is a much smaller
document, then so much the better.  The system responding could internally
do whatever it wanted to do, of course.  

> Thinking in terms of RDF or OWL documents insofar as global
> access of resource-specific knowledge is concerned (either
> authoritative or 3rd party) is not going to provide a scalable
> and efficient solution.

Why not?  Neither RDF nor OWL documents need be 2.4MB in size.  

> Regards,
> 
> Patrick
> 
> --
> 
> Patrick Stickler
> Nokia, Finland
> patrick.stickler@nokia.com

Peter F. Patel-Schneider
Bell Labs Research

Received on Thursday, 22 April 2004 11:39:59 UTC