- From: Massimo Marchiori <massimo@w3.org>
- Date: Sun, 7 Apr 2002 21:45:41 -0400
- To: massimo@w3.org, phayes@ai.uwf.edu
- Cc: bwm@hplb.hpl.hp.com, www-rdf-comments@w3.org
Pat, from your reply I essentially get that I have to be more technical and give more details (mea culpa, but usually hard to do so when using email, one usually just wants to synthesize and save time ;). So I'll try to be less "concise" (polite wording for "cryptic" ;) and expand http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Apr/0045.html (not to extreme, but with a bit more of detail). I see you'll be in Amsterdam for WOWG, so chatting will probably be much faster and clearer, but anyway, here's a first short expansion on the mailing list for the record. Motivation: this is essentially something dating back to 1999, when studying how to formalize the "M" in M&S. The modus operandi has been just to formally extract the "minimum" that M&S was saying, and nothing more (so, carefully analyzing M&S *line by line* with mathematical glasses and trying to extract the "normative minimum"). So, line-by-line analyzing the spec, that's the conclusion one can get. First, there's big line that can be drawn, as usual, between the formalization of the data model, and the formalization of the semantics. Now, I talk of an "algebraic" data model because I include in the data model part the functional requirements about containers. Because, this is the maximum where we can go easily without much arbitrariness, because you don't need to define a semantics yet. So anyway, the data model is a normal one (essentially like MT, although even here some choices have to be made, eg in what kind of graphs are allowed... but that's another thread ;). Then you can impose on the data model an equational part, stating the requirements on constraints. I'll keep easy here and just show the easiest part, the Bag. Struggling with ASCII, let's just indicate with BAG(t_1,...,t_k) a shorthand to denote a part of an RDF graph that have a bag with components t_1,...,t_k. Then, the fact that a bag is an unordered list of resources is expressed using the following equivalence relation Eq: B1: BAG(t_1,...,t_k) Eq BAG(t_pi(1),...,t_pi(k)) (where pi is any permutation in [1,k]) [historical note: old w3c members might recognize the complications here partly stem from issue 30 of the original RDF M&S issue list, cf http://www.w3.org/RDF/Group/Syntax/issuesd.html#c30 , which was at the time (1998) not accepted and closed...] Then, every RDF application APP (formally, map from rdf graphs to your domain of interest) should be congruent to Eq (i.e., G1 Eq G2 => APP(G1)==APP(G2) ). Tantamount, we can reason not on the simple data model any more, but on the quotient data model (set of equivalence classes wrt Eq). Things are slightly more complex than this actually, but that's a good hint to see why we can go from a data model to an "algebraically defined" data model, so to say. Now, the good thing of having this algebraic data model is that, so far, things are pretty smooth, as you don't have to define the semantics yet (which is the less formal part of M&S, and so the most troublesome to distill in a sense). The equations express behavioural constraints: every application APP "doing something" with an instance of the RDF data model must treat equivalent (wrt the above equations) instances in the same way. Then, on to the semantics part (not the "needed" one, or the most "useful" one, just the bare minimum (good or bad) that M&S expresses). Now, I won't stay here to express the whole deriving semantics, possible choices and various subtelties (as, that would equal to write down a huge paper that I/nobody didn't find any time to write in years...), so just quickly focus on the parts that matter for our discussion (in a simplified fashion). An easy way to define such a semantics is to use matching graph rules: we give a graph (extended with variables as nodes as well) and every time it matches a part of the graph, it triggers and subst it with something else. The "translation", so to say, eventually produces facts into a target (HOL) logic. Precise definition of the logic is inessential to the present discussion (although yes, of course there'd be much to say here too ;) Now, here alas we only have text, so struggling with ASCII we can use verbatim triples (variables start with the dollar sign, and variables matching anonymous nodes only are denoted by $A). Rule for containers: If there's a subgraph of the form: $A rdf:type rdf:Bag $A rdf:_1 $1 $A rdf:_2 $2 ... $A rdf:_n $n then produce Bag($1,...,$n) (where Bag is then a predicate adequally defined, to respect the equational data model rules). And, similarly for Seq and Alt. Rule for reification: If there's a subgraph of the form: $1 $2 $A $A rdf:subject $S $A rdf:object $O $A rdf:predicate $P $A rdf:type rdf:statement then produce $2($1,$P($S,$O)) Note generalized formality sloppiness here and above (eg "$2(...)" would be something like [[$2]](...) where [[..]] gives the appropriate predicate, etc etc). Also note how how the reification rule needs the "$1 $2" part to trigger, formalizing the fact that reification alone says nothing. So, the above rules just do essentially what M&S say: containers are lists, and reifications allow someone ($1) to make a statement ($2) about another statement ($P($S,$O)). Now, after all these premises, we are finally to anonymous nodes: where have they gone? Line-by-line analyzing the M&S spec, you'll see that anonymous nodes are only used to specify containers and reifications. This doesn't happen by chance, but just because their precise content is *not needed*: they disappear in the semantics (see above, no $A's !). So, that's what I meant when talking in http://lists.w3.org/Archives/Public/www-rdf-comments/2002AprJun/0012.html about the extension chain and saying that anonymous nodes are not "first class" nodes in M&S: Anonymous nodes don't "denote" anything, they're just syntactical convenience in the data model, *exactly like anonymous variables are, for example, in Prolog*. Now, the whole importance of the above constructions are that, as said, they (claim ;) constitute a (possible) "minimal interpretation" of M&S: are formally grounded in the M&S spec (the "M" part), line by line, and one can show that anything in them follows precisely from corresponding lines in the spec, and conversely, that any line in the spec has its corresponding (and, the correspondance is not stretched, but as faithful as possible). Of course, with a possible range of choices (that is clearer the moment you try to do actually the thing, line by line...), but that do not modify the essential features of the resulting semantics (in our case, treatment of anonymous nodes). As instead the current MT differs in essential parts (see eg anonymous nodes), I'd like to hear explanations on how "minimal" this interpretation is (e.g., bare-bones, compare to the previous one wrt anonymous nodes), as I think MT is stretching M&S, in some parts even in excellent ways, but still, stretching (to this extent, it'd have helped even more to see how containers and reification, two of the essential features of RDF, are dealt with). It might turn out the current MT is actually the best way to go (even with a stretch, eventually who cares? ;), but the analysis is far from trivial, and has to be done. Thanks, -M
Received on Sunday, 7 April 2002 21:46:45 UTC