- From: Christian De Sainte Marie <csma@fr.ibm.com>
- Date: Tue, 15 Jun 2010 22:32:20 +0200
- To: Jos de Bruijn <jos.debruijn@gmail.com>
- Cc: RIF <public-rif-wg@w3.org>
- Message-ID: <OFED3DFC45.A63F6E5D-ONC1257743.004F7619-C1257743.0070D4CD@fr.ibm.com>
Jos, all, Jos wrote on 15/06/2010 14:08:09: > > Herewith my review of the XML-Data document as of 2010-06-15T09:25 CEST. Thanx for the thorough review (we had, already, some discussions, off line, in addition). > Overall, I think the document is going in the right direction. I > believe it is in line with earlier discussions we had in the group > concerning RIF+XML combinations. There are, however, several issues > (mainly the comments 10-23) that I think should be resolved before > publication of the document as public working draft. Detailed comments > are below. So, let us try to resolve those issues quickly, so we can publish the WD on June 22 :-) > I will start with some issues which I believe require discussion in the group: > > [update: in the current version of the document, issue 1 has been > resolved by implementing solution a)] > > 1- The document assumes that the location argument in an Import > directive in Core is optional (e.g., in the definition just before > section 4.1). This is not the case; in Core, the location argument is > mandatory. Thus, the document implicitly assumes an extension of Core. > I think it is not desirable to define such an extension, since it will > make the whole RIF landscape even more complex than it currently is. > Furthermore, this extension is problematic, since in the presentation > syntax it is not possible to distinguish between an Import statement > having only a location and one having only a profile. > Now, the reason for having this extension in the first place is to be > able to use an XML Schema as the data model of a ruleset without > having to specify where the XML instance data comes from. Two obvious > solutions that are in Core come to mind: > a) use a dummy URI to denote an empty XML instance document (e.g., > rif:emptyXML) > b) put the XML Schema in the location field and define a profile for > XML Schema (e.g., rif:xml-schema) The reason I prefer solution (a) is that, with solution be, the schema would be in different locations with the same semantics, depending on whether or not there is a link to an MXL data document to be imported. In the updated version, I use the IRI: http://www.w3.org/2007/rif-import-location#no-data > 2- I find it slightly awkward to have strings as attributes in frameformulas. > I mean as attributes in frame formulas. The way the semantics is > defined, element and attribute names are represented as strings in the > attribute position of frame formulas. > e.g., if you have <A B=""><C></C></A> > > this corresponds (roughly) to the RIF formula > ?x["attribute(B)"->"" "C" -> ""] > > I think it would be natural to require all elements in an XML document > to have namespaces (default namespaces are easy to add). However, > attributes are a slightly more complicated issue, since the default > namespace does not apply to them. Therefore, I don't really have an > elegant solution in mind at the moment. I agree that we could limit ourselves to XML documents where all the elements are in a namespace. That would be a restriction, but the use of namespaces is gaining, if not already prevalent. But the default is that attributes are not qualified, so, even if the elements are in a namespace, the attribute will not, in most cases (e.g., attributes are not qualified, in the RIF schemas). The rec on namespaces consider that attributes belong, de facto, in the naùmespace of the owner element, but the XML schema spec does not say anything about that AFAIK; so, we cannot just use the namespace of the owner element. And the lexical space of rif:iri is that of absolute IRIs, so, we cannot have a rif:iri with only a local name :-( That is why I included xs:NCName. But if somebody has a better solution... One question is: is it possible, for an element, to have two attributes with the same local name, one being in the same namespace as the element, the other being in no namespace? I see nothing that would forbid that case, but if there is, then we could follow the namespace rec and associate namespace-less attributes with the namespace of the owner element. > Further substantive comments: > > 10- why give separate definitions for the semantics of Core+XML and > BLD+XML combinations? The semantics of RIF Core is the same as that of > BLD; the only difference between the two dialects is the syntax. I > would suggest to remove section 4.2 and say that the semantics in > section 4.1 applies to both dialects. Your argument about saying that the semantics defined for the one applies for the other one as well works both way: I essentially used it the other way round, which seems more natural to me. Core is the core dialect, so, it seemed to make sense to specify the semantics of the combinations for Core, and, then, extend it to BLD and PRD, which, from the user point of view, are extensions of Core. I am not convinced why we should do otherwise, but if there is overwhelming support to rewrite everything the other way round, I will do it. > 11- as discussed (privately), all element information items in an > instance of the data model are meant to be distinct. This must be > mentioned in the definition. Actually, that one has already been taken care of. I added the following paragraph/sentence, just before the definition of Core+schemaless XML interpretations: "Finally, in the remainder of this document, the notation {I_DM} will be used to denote the set of all the element information items in IDM, after the references have been resolved. Notice that, after the references have been resolved, all the elements in {I_DM} are distinct. I will add (see 17, below), that everything else is unchanged, when replacing RIF Core/BLD semantic structures with RIF Core/BLD+XML data combined intepretations in the definitions. Is that ok? > 12- Is there a difference between QName and expanded QName? If so, > what is the difference? A QName is a string made of an optional prefix, a colon and a local name. An expanded QName is a triple that contains an optional prefix, an optional IRI and a local name, where the IRI is the IRI associated to the prefix. Shall I copy the XDM definition in the document? I thought I would put it in the glossary (to come in a future WD). > 13- section 3.2, 8. [typed value], first bullet: why do you deviate > from the XQuery data model? Because we need a handle to the element information itself, when it is object-like (that is, element-only children), so we can dig into it. And XDM, in that case, defines the types value as being undefined, which is useless in our case... > 14- section 4: what is are XML instance and data documents, and what > is the difference with XML documents? Both notions should be defined. XML instance, or data, document as opposed to an XML schema (which is also an XML document; and, btw, could very well play the role of the data document in a combination). I think that XML instance document is usual for XML document that are instances of a schema. I used XML data document, or XML data, when talking of the XML data with which the RIF doc is combined. Do you really think that requires an explanation? Would an entry in the glossary be enough, or does it need be more prominently in the spec? Anyway, I fear that, most of the time I used "instance" and "data" interchangeably, so I have to check that, as well. > 15- section 4: why limit yourself to combination with only one XML > document? In fact, the Core syntax does not have this limitation, so > it is unclear how You are absolutely right, the intent is not to limit to one document. And, since, as you rightly pointed in a private discussion it should, the definition uses, now, the set {I_DM} instead of the sequence I_DM, it is pretty easy to correct that. I will do it before tomorrow noon, my time. > 16- a RIF document is interpreted using a semantic multi-structure, > not a semantic structure. This needs to be taken into account in the > definitions in section 4. The spec says explicitly that, apart from the additional constraints in the definition of a semantic structure, the semantics of RIF Core/BLD+XML data combinations is exactly unchanged from the semantics of RIF Core/BLD. Is not that sufficient? Or do you think there are also differences in the handling of multi-structures? I did not check, to say the truth :-( I will, but I am not sure that I do anything in time for a publication on June 22: if changes need be made, can we do with an editor's note for that round? > 17- notions of consistency and entailment, based on combined > interpretations, need to be defined for RIF+XML combinations. Stating > that these notions remain unchanged from Core does not work, since you > do not have Core structures, but combined interpretations here. Well, combined interpretations are semantic structures for RIF+XML data combinations, aren't they? Anyway, would "the definitions of [these notions] remain unchanged, except that every reference to a semantic structure I is replaced by a reference to a combined interpretation <I, I_DM>" do? Or do you think that the definitions should be repeated? But that would complexify the spec unnecessarily (or, rather, give it the appearance of complexity), I think. > 18- section 4.1, 4th paragraph: constants are not "in" any lexical > space. Constants have the form l^^s, where l is a string and s an IRI > denoting a symbol space. I will correct the terminology. By tomorrow noon. > 19- section 4.1.1, first bullet: the definition of string-matches is a > bit hard to read and overly restrictive (e.g., it does not account for > rdf:PlainLiterals without language tags). I would suggest to either > match L_dt(c) (here, L_dt is the lexical-to-value mapping of the > datatype of c) with [string value] or, better yet, just give a > semantic definition: a string s string-matches i iff s=[string value] > after white space normalization [of both s and [string value], I > presume]. Similar for the second bullet. That is what I thought it said (after I changed the definition after our earlier discussion on the subject)! :-) But I will revise, using your suggested wording. By tomorrow noon. > 20- definition in sec 4.1.1, 2.: the condition does not take frame > formulas with multiple attributes, nor equality between IRIs into > account. I would suggest to work on the semantic level, giving the > definition in terms of domain elements and the I_frame mapping. Also, > when speaking about domain values, you can speak directly of strings, > rather than strings obtained from constants. Similar for bullet 3 and > the corresponding bullets in the definition in sec 4.1.2. In addition, > when using a semantic definition in sec 4.1.2, you no longer need to > do type matching; all you need to do is require that the value on the > RIF side is equal to [typed value], when discarding the type label. Ok. I did not think I could do it, but, now, I think I understand how... I will try to do that by tomorrow noon. > 21- section 4.1.3: what is the operational semantics of Core? It's not > in the Core spec. Well, the spec says [1]: "RIF-Core is [also] a syntactic subset of RIF-PRD, and the semantics of RIF-Core is [also] identical to the semantics of RIF-PRD for that subset." And the primary semantics of PRD is the operational one. [1] http://www.w3.org/TR/rif-core/#RIF-Core_Semantics > 22- definition in section 4.1.2: the first condition in both 3a and 3b > (the existence of a corresponding element in the XSD) seems redundant, > since I_DM is based on a PSVI, and so must be schema-valid. Is that > true? The condition is needed to take substitution groups into account: you can have a substitution group where the head never occurs in the XML data, but the rule is written against the head element. > 23- definition in section 4.1.2: right now I cannot foresee the > consequences of condition 4. It seems that including all possible XML > datatypes is a problem, for example we already identified that the > duration datatype poses a problem for RIF. The question is whether > there are possible other datatypes that pose problems. Datatypes that > are derived from types that are in RIF do not need to be included in > DTS, since their value spaces are are necessarily subsets of D_Ind and > there are syntactic representations of all the values. > For this round of publication, I would suggest to add at least an > editor's note saying that the condition will be further refined in > future versions. Condition is not about including all possible XML datatypes, but the ones that are used in the XML data doc or the associated XML schema. The datatypes that were problematic for DTB, were problematic because they were not usually implemented, or consisted wit hthe one implemented, in most or mainstream rule engines. But if a data doc or a schema uses a datatype that your implementation does not support, your in trouble if you want to use it anyway, so I do not think this is a problem... Anyway, I certainly have nothing against an editor's note to call attention to, and ask feedback on, possibly unforeseen consequences. > Editorial comments: > > 101- Sec 3.1, 4th paragraph: references should be included that > explain what general and external parsed entities are and how they are > expanded Yes, many references need be added. I will add that one by tomorrow noon. > 102- There is a definition of an "instance of the data model", but not > of the data model. Given that there is no such definition, I think it > unwise to speak about instances of it, since this only makes the spec > harder to understand Hmmm, I thought that most of section 3 what about the definition of the data model... Sorry, I think that I do not understand your comment: can you reformulate it, please? Or give an example where the use of "instance of the data model" makes the spec harder to undertsnad? > 103- Section 4, first paragraph: why introduce the additional term > "interpretation" here? I would suggest to stick with the term > "structure", as in the other RIF specs. Semantic structures are often called interpretations. And I am more familiar with that term. I will remove the introduction of the term there, but I hope that you will allow me its use wherever else it is used :-) > 104- editor's note just above sec 4.1.1: yes, I think it should be > said explicitly Ok. I will do the change. > 105- definition in section 4.1.1: the notation {I_DM} is somewhat > redundant with the requirement in the definition that all references > in I_DM have been resolved Well, you prompted me to introduce that notation, when you remarked that it was the set of the elements in I_DM that had to be included in D_ind, not I_DM itself, which is a sequence with possibly duplucated element information items... > Further questions: > > 1001- Is it true that it is guaranteed that every element and every > attribute has a type in a PSVI infoset? In a schema it is possible to > write such vague things as xs:any, thereby not actually specifying the > type of a particular element. See http://www.w3.org/TR/xpath-datamodel/#PSVI2NodeTypes :-) Sorry, it is a bit late for me to think that clearly at this time of day. I will try to respond to that tomorrow. Thanx again for the comments. The version of the draft updated to take them into account should be ready tomorrow by noon. Cheers, Christian IBM 9 rue de Verdun 94253 - Gentilly cedex - FRANCE Tel. +33 1 49 08 35 00 Fax +33 1 49 08 35 10 Sauf indication contraire ci-dessus:/ Unless stated otherwise above: Compagnie IBM France Siege Social : 17 avenue de l'Europe, 92275 Bois-Colombes Cedex RCS Nanterre 552 118 465 Forme Sociale : S.A.S. Capital Social : 611.451.766,20 ? SIREN/SIRET : 552 118 465 03644
Received on Tuesday, 15 June 2010 20:33:06 UTC