- From: Sean Luke <seanl@cs.umd.edu>
- Date: Mon, 27 Dec 1999 12:43:31 -0500 (EST)
- To: Sankar Virdhagriswaran <sv@crystaliz.com>
- cc: www-rdf-interest@w3.org, jhendler@darpa.mil, heflin@cs.umd.edu
On Thu, 23 Dec 1999, Sankar Virdhagriswaran wrote: > > So why not just dump the special-case stuff and go with simple > > (simplistic) general-purpose inferential semantics to begin with? > > I am wondering about two different types of scalability: > > a) Agent writer(s) scalability: The 'real world' still is hacking away in > VBScript/Perl. For them, inferencing (of the logic/prolog kind) is too > foreign. Given an object model, they are more prone to write 'procedural' > agents. At best, we can hope them to write SQL queries. That's fair; such people's schema won't use general inferences. But lack of use doesn't argue against including such things for those groups which *do* choose to use inferences. > Can you imagine writing an 'inferencing engine' for the semantic web > that can scale the same way if one adopts your model? I think we need > to start very, very simple since we are looking at a scalability that > has not been attempted by any of the knowledge representation projects > I know. Now _this_ gets to the heart of the matter. While Datalog is polynomial, and in general I imagine schema on the web will be relatively disjoint (or hierarchical at worst) nonetheless Datalog's worse case is still pretty bad compared to a single SQL query. But such things can be had in bits and pieces. As I had mentioned before, one nice model would be for RDF to provide several "levels" to which schema can subscribe. A basic level would have no inferential semantics. A higher level would allow only those semantics which can be easily flattened (inferences declared "final", that is, which only operate over ground facts or facts recursively created by the inference itself). This would provide a tremendous advantage to schema-merging and eliminating the need for redundant ground statements such as the examples in my earlier post. Higher levels might only permit inferences actually declared in the schema proper, or ultimately all inferences. If Altavista doesn't have a system capable of some level of sophisticated semantics, it can simply tell people that it doesn't accept schema which insist on them. If specialized agents can use inferences to their advantage, let them. Natural selection on the web will sort this out nicely. While it's one thing to say that a Datalog-like model has too much computational complexity for current RDBMS out there, it's hard to say this while RDF _presently_ appears to have semantic features which are already difficult to implement in a database. aboutEach, subPropertyOf, subClass, etc. already are O(n^2), though I imagine they could be flattened at fact-gather time (aboutEach would require keeping the rules around, because it's open-world). In its simplest to implement form (flattening), reification bulks up relational tuples with unique identifiers. Reification in combination with aboutEach could also result in pretty nasty worst-case semantics, though I believe that serindipidously RDF's restriction on no inverse relations allows RDF to escape that dilemna for now. At any rate, it seems to me that by insisting on binary relations and several custom-designed inferential mechanisms, RDF is painting itself into a corner by requiring these features of even lowly RDBMS, while making it difficult to create more general mechanisms that cover these warts for those more advanced agents which will handle general semantics in the future. > We are operating in a completely different world as compared to the > closed world situations where knowledge representation techniques have > been used in the past. Of course the web is open-world. But since both RDF and SHOE are additive-only, and monotonic, I'm not sure if this matters. Sean
Received on Monday, 27 December 1999 12:43:39 UTC