- From: Bob MacGregor <macgregor@isi.edu>
- Date: Sat, 28 Aug 2004 10:57:55 -0700
- To: Frank Manola <fmanola@acm.org>
- CC: www-rdf-interest@w3.org
- Message-ID: <4130C7A3.9070809@isi.edu>
Hi Frank, I don't want to have too many comments on comments, but at least one of your points needs clarification. Not only do I not blame Pat Hayes for the state of reification in RDF, I applaud him. When I first started using RDF, I had definite plans for using statement reification, and the lack of semantics bothered me, because it meant that certain expectations I had regarding what an RDF triple store ought to do for me were not met. After I found out from direct experience how bad statement reification is for representing provenance information, I switched sides, and became an opponent of RDF reification. At that point, the lack of semantics became a plus, because it meant that implementers of triple stores had somewhat fewer obstacles in their way when building to spec. You mention the possibility that if we admit "named containers" (contexts) to RDF then we open the possibility of people representing provenance information in more than one way. There are two major objections to that position. One is that its impossible to prevent that possibility -- we are dealing with logic, and any logic that has sufficient expressivity for practical application will admit more than one way of solving a given technical problem. RDF's lack of semantics makes it in some sense very expressive. I am already expressing provenance information in RDF; and the way I do it is certainly not approved by the RDF spec. In fact, its not approved by me -- I could have done it the "right" way by embedding reified statements in a named container, but that would have had horrendous space implications, and been impossible from a querying point of view. Instead, I invented a "hack" that gives the right behavior, but would never be recommended as an official way to do contexts. So, by necessity, I'm doing exactly what you are counseling we shouldn't be doing. The second objection to your position is the chicken-and-egg problem. Ideally, we could experiment with various kinds of provenance representations, figure out which is the best, and then make that a standard. However, the gap between what's needed is too large to admit experiements within the RDF framework. Think about the relational database community. They have gone more than 25 years without a decent solution to representing provenance information. Why? because their representation doesn't admit a good solution. Unfortunately, RDF is in nearly as bad a state right now -- not quite as bad, because the hack that I use would not be practical within the relational model -- but bad enough. So we are in a position that the only people that can experiment with provenance are those willing to violate the RDF spec. But its actually worse than that. To get decent performance while using provenance, you have to make modifications at the implementation level. Relatively few users are going to build or extend an existing triple store just to get provenance information. So, the "experiment" is going to proceed relatively slowly. On the other hand, suppose Jena implemented quads. Suddenly, the opportunities for experimentation would increase 100-fold (to pick a random number). Then there would be a flurry of experimentation, probably leading to much earlier recognition of what a standard semantics for provenance (a step beyond contexts) should look like. Right now, the people adopting your position are consigning the majority to a Gedankenexperiment, where the expected rate of progress will be about what it is in the relational database community -- nil. Cheers, Bob Frank Manola wrote: > Hi Bob-- > > Some comments below. > > Bob MacGregor wrote: > >> >> Hi Frank, >> >> You make many good points; I don't like to get deeply nested, so I'll >> respond just on top. >> >> You say RDF already has containers. True -- its easy to create a >> container >> of things that denote "entities", but its MUCH less practical to >> create a container >> of statements. Yes, its doable, but this is the Turing argument all >> over again -- we >> already have assembly language, but we would like to code in Java. > > > I understand what you say about the ease of creating containers of > statements, but I don't think this is quite the Turing argument. In > the first place, you're proposing to add *another* container > construct. There certainly may be good reasons why another such > construct is needed, but it seems reasonable to me to look at this > pretty carefully (e.g., look at how it interacts with the existing > containers which, as I pointed out, can still be used for this > purpose). After all, I don't think we want to duplicate the situation > we were in with the current reification vocabulary. In the second > place, it seems to me that the difficulty you describe in creating > containers of statements is largely due to the difficulty of > identifying statements, rather than the need for a new container per > se. Granted that a special kind of container just for statements > might (indirectly) help deal with some of those problems, but if, for > example, I ultimately need to identify individual statements, creating > a separate container for each statement seems like unnecessary > indirection. > >> >> You are insisting on semantics. RDF has almost no semantics -- >> graphs are >> just graphs; there is no attempt to assign truth per se. I'm pushing >> for named >> containers, another data structure, with no built-in semantics pe se >> (except >> that the contexts I use allow for contexts within contexts, which >> induces >> a few entailments). Note: Pat Hayes has carefully insured that RDF >> statement >> reification has essentially no semantics. > > > I understand what you're after. My point is simply that if you have a > purely structural approach using containers and nothing else, one > person might use the container for indicating provenance, and other > might use the container for an entirely different purpose, and there > would be no explicit indication which was which (worse, different > people might use the same structure in *slightly different ways* to > indicate provenance). My main concern isn't just to represent > stuff, but to do so in a more interoperable way. Hence my belief that > we need somewhat more than simply a new container structure (even if > it's only some additional conventions). Note: You needn't saddle Pat > with all the "blame" (if that's what it is) for the state of > reification. In the first place, the entire RDF Core WG is > responsible for the specs as they stand, not just the individual > document editors. In the second place, the WG found the reification > vocabulary in a confusing state, and picked what seemed to be the most > reasonable interpretation for the provenance use case. The fact that > reification has almost no semantics mostly follows from the ability > (or lack of same) of RDF to reflect those semantics (the point you > made at the beginning). > >> >> Basically, you are advocating a cerebral exercise, followed by adoption. >> The problem is that its hard to appreciate the utility of something like >> contexts unless you have the option to use them (not just imagine what >> it would be like). Reified statements are a good negative example -- on >> paper, they look promising, but in practice they s*ck. Only relatively >> few of us have the luxury of building applications using a real >> context mechanism >> (have you?). >> > > If by "advocating a cerebral exercise" you mean advocating thinking a > little more about fundamental additions to RDF (and their interactions > with the existing facilities), then that's certainly what I'm doing. > In particular, I fully appreciate the utility of "contexts" for doing > dozens of useful things, just as many other people have since > McCarthy's notes, and Guha's dissertation. The problem is that those > dozens of useful things tend to involve different meanings and, as I > said above, I'm concerned about interoperability. > > Cheers, > > Frank -- Bob MacGregor Chief Scientist Siderean Software Inc 5155 Rosecrans Ave, #1078 Hawthorne, Ca 90250 <http://maps.yahoo.com/py/maps.py?Pyt=Tmap&addr=5155+Rosecrans+Ave&csz=Hawthorne%2C+Ca+90250&country=us> bmacgregor@siderean.com <mailto:bmacgregor@siderean.com> tel: +1-310-491-3424 fax: +1-310-491-3338
Received on Saturday, 28 August 2004 17:58:31 UTC