- From: David Megginson <david@megginson.com>
- Date: Sat, 30 Dec 2000 07:21:33 -0500 (EST)
- To: xml-dev@lists.xml.org, www-rdf-interest@w3.org
Seth Russell writes: > So true, the Semantic Web doesn't work without "data smushing"! I > think we should even apply "data smushing" to nodes with URIs, > cause there gonna be people misapplying URIs. My question is: has > anybody come up with some good algorithms for "data smushing" ? (I > love that term, I've used it 3 times now.) Maybe we should come up > with a schema for expressing smushing rules in RDF ... any hint of > that being done yet? There are two separate problems here: 1. combining data from two different sources; and 2. pruning redundant entities. It may be the case that the different sources use the same URI to identify the same entity; likewise, a single source with a large database might end up with many duplicate versions of the same entity shadowing each other. Outside the research lab, #2 is extremely difficult. For #1, however, all we have to do is extend the (oversimplified version of the) RDF logical model to include one more member: {predicate, subject, object, source} where source is a URI representing the source of the information (probably, but not necessarily, the URL of an RDF document; it could also be a URI representing a news wire, for example). Now, query operations, searches, etc. can take into account where the information came from, and can distinguish, say, two "name" properties provided by the same source from two "name" properties provided by two different sources. <rant> As I've mentioned many times before, the published RDF logical model needs to be extended anyway because it does not distinguish specific subjects from open-ended subject patterns (rdf:aboutEachPrefix), it does not distinguish literal objects from resource objects, and it does not allow for xml:lang (which the RDF spec states is significant in RDF processing). A logical model that takes all of this into account would look something like {predicate, subject, subjectType, object, objectType, lang} or, with the source information {predicate, subject, subjectType, object, objectType, lang, source} You could argue that subject type is an internal trait of subject, and that objectType and lang are internal traits of the object, but then the grammar needs to be elaborated properly: statement: predicate, subject, object predicate: URI subject: URI, subjectType subjectType: ("uri" | "pattern") object: URI, objectType, lang objectType: ("literal" | "resource") lang: LITERAL It's still not all that bad, but the {predicate, subject, object} thing was always bogus. </rant> All the best, David -- David Megginson david@megginson.com http://www.megginson.com/
Received on Saturday, 30 December 2000 11:55:04 UTC