- From: Patrick Stickler <patrick.stickler@nokia.com>
- Date: Thu, 22 Apr 2004 11:31:03 +0300
- To: "ext Peter F. Patel-Schneider" <pfps@research.bell-labs.com>
- Cc: www-rdf-interest@w3.org, pdawes@users.sourceforge.net
On Apr 20, 2004, at 18:45, ext Peter F. Patel-Schneider wrote: > > >> Hi Peter, >> >> [My Ramblings snipped - see rest of thread for info] >> >> Peter F. Patel-Schneider writes: >>> >> [...] >>> >>> Well, yes, but I don't think that the scheme that you propose is >>> workable >>> in general. Why not, instead, use information from the document in >>> which >>> the URI reference occured? I would claim that this information is >>> going to >>> be at least as appropriate as the information found by using your >>> scheme. >>> (It may, indeed, be that the document in which the URI reference >>> occurs >>> does point to the document that you would get to, perhaps by using an >>> owl:imports construct. This is, to me, the usual way things would >>> occur, >>> but I view it as extremely important to allow for other states of >>> affairs.) >>> >> >> Unfortunately, most of the RDF I consume doesn't contain this >> contextual linkage information (or even appear in well formed >> documents). Take RSS1.0 feeds for example: If there's a term I don't >> know about, the RSS feed doesn't contain enough context information >> for my SW agent to get me a description of that term. > > Yes, this is a definite problem with some sources - they use terms > without providing information about their meaning. ??? The term is denoted by a URI. The authoritative meaning of that term should be obtainable via that URI (e.g. by using a solution such as URIQA). Each source which uses a term should not have to bundle along the definition of that term! Nor should it be manditory that the source have to indicate how/where that term is fully defined by the owner of that term. All that should matter is the URI. Period. That's all. Nothing more should be required for the agent to obtain the authoritative description of that term, if required. There is *NOTHING* wrong with RSS 1.0 in this regard. There is no reason whatsoever why an RSS instance should indicate how the definitions of the terms used should be obtained. If some client doesn't understand a term, there should be a standardized SW-optimized means for the client to obtain the term's definition (and IMO, that should be done using URIQA or something similar). > Such sources are > broken and, in my view, violate the vision of the Semantic Web. Then it would appear that your vision of the SW has little intersection with more commonly held vision of the SW. > > How, then, to do something useful in these situations? A scheme that > goes to a standard location (namely the document accessible from the > URI of the URI reference) is probably no worse than any other. > However, it should always be in mind that this scheme incorporates a > leap of faith: faith that the standard document has information about > the term; faith that the standard document has usefully-complete > information about the term; faith that the document using the term is > using it in a way compatible with the information in the standard > document. Each of these can leaps of faith can be counter to reality > and, worse, they can be counter to reality in undetectable ways. Precisely, which is why thinking in terms of "documents" and limiting one's search for information about a term to particular documents is non-scalable and fragile. Just as there is no standards-imposed constraints on how representations are stored/managed internally by a web server which responds to a GET request for a given URI and returns a representation -- so too should there be no standards-imposed (or in any other way imposed) constraints on how authoritative descriptions are stored/managed internally by a SW server which responds to an MGET (or similar) request and returns the description. Thus, whether that term definition is expressed in one or a dozen places, whether it is stored in a physical RDF/XML instance or a database, whether one or a hundred people are involved in its creation or management, all is irrelevant to the agent and should be rightly hidden from view. All the agent wants is the authoritative description -- no matter how it is defined/managed. The SW needs a layer of opacity in the publication/access of resource descriptions just as the web provides a layer of opacity in the publication/access of representations. RDF/XML and OWL "documents" simply get in the way, and are the wrong level of resolution to try to provide a scalable, global, and efficient infrastructure for the publication and interchange of resource descriptions across the SW. > Well, I would expect that a semantic search engine would try to > present the results of its search in the form > <information source> contains/contained <information> Probably. > (Amazing! A potential use of RDF reification.) Named graphs will IMO provide a better solution (and certainly require less triples). >> My experience has been that once you start writing SW applications, >> the notion of 'document' becomes clumsy and doesn't provide much >> value. For example, we have lots of RDF published in documents at >> work, but typically applications don't go to these documents to get >> this information - they query an RDF knowledge base (e.g. sesame) >> which sucks data in from these documents. > > But how then do you determine which information to use? There has to > be some limit to the amount of information that use and I don't see > any method for so doing that does not ultimately depend on documents > (or other similar information sources such as databases). Documents are simply the wrong mechanism, at the wrong architectural layer to construct our "webs of trust". Named, signed graphs are IMO the answer. (Jeremy Carroll, Chris Bizer, Pat Hayes, and I are finishing up a paper on an approach to addressing this issue which should be web-visible soon). >> The problem is that if we don't do this soon, a number of centralized >> spike solutions will appear based on harvesting all the RDF in the >> world and putting it in one place (e.g. 'google marketplace'). > > Well, maybe, but I don't see much utility to harvesting the > information in random (and very large) collections of documents and > unioning all this information into one information source. Apart from a very few, if even ultimately only one highly ambitious service (such as Google) most collections of knowledge will probably be highly specialized (e.g. harvesting all wine related knowledge, or all knowledge about vintage golf clubs, etc.). And most likely, such collections would not (necessarily) be collections of "documents" but collections of knowledge -- harvested via a standardized interface which rightly hides the underlying mechanisms used to manage such knowledge. > I do, > however, see lots of utility in analyzing semantic information from > lots of documents and providing pointers back to those documents, > suitably > organized. Simply pointing back to documents is leaving all the real work for each agent -- to parse and extract from such documents the individual bits of information that are needed insofar as a particular term or resource is concerned. It's not the least bit efficient or scalable. Consider a mobile client that needs to understand the meaning of some property. The "document" that defines this is a monolithic RDF/XML instance for an ontology defining 750 terms with labels in descriptions in 17 languages. It is 2.4 MB in size. What a fat lot of help getting the URI of that massive RDF/XML "document" is going to be when all that is needed is a concise description of a single property. What the mobile client *should* be able to do, is to ask the web authority of the URI denoting that property for a concise bounded description of that property, and then proceed with whatever it was doing -- with no concern for how that knowledge was managed, stored, partitioned, etc. etc. Thinking in terms of RDF or OWL documents insofar as global access of resource-specific knowledge is concerned (either authoritative or 3rd party) is not going to provide a scalable and efficient solution. Regards, Patrick -- Patrick Stickler Nokia, Finland patrick.stickler@nokia.com
Received on Thursday, 22 April 2004 04:34:00 UTC