- From: Jos de Bruijn <jos.debruijn@gmail.com>
- Date: Tue, 15 Jun 2010 14:08:09 +0200
- To: RIF <public-rif-wg@w3.org>
Christian, all, Herewith my review of the XML-Data document as of 2010-06-15T09:25 CEST. Overall, I think the document is going in the right direction. I believe it is in line with earlier discussions we had in the group concerning RIF+XML combinations. There are, however, several issues (mainly the comments 10-23) that I think should be resolved before publication of the document as public working draft. Detailed comments are below. I will start with some issues which I believe require discussion in the group: [update: in the current version of the document, issue 1 has been resolved by implementing solution a)] 1- The document assumes that the location argument in an Import directive in Core is optional (e.g., in the definition just before section 4.1). This is not the case; in Core, the location argument is mandatory. Thus, the document implicitly assumes an extension of Core. I think it is not desirable to define such an extension, since it will make the whole RIF landscape even more complex than it currently is. Furthermore, this extension is problematic, since in the presentation syntax it is not possible to distinguish between an Import statement having only a location and one having only a profile. Now, the reason for having this extension in the first place is to be able to use an XML Schema as the data model of a ruleset without having to specify where the XML instance data comes from. Two obvious solutions that are in Core come to mind: a) use a dummy URI to denote an empty XML instance document (e.g., rif:emptyXML) b) put the XML Schema in the location field and define a profile for XML Schema (e.g., rif:xml-schema) 2- I find it slightly awkward to have strings as attributes in frame formulas. I mean as attributes in frame formulas. The way the semantics is defined, element and attribute names are represented as strings in the attribute position of frame formulas. e.g., if you have <A B=""><C></C></A> this corresponds (roughly) to the RIF formula ?x["attribute(B)"->"" "C" -> ""] I think it would be natural to require all elements in an XML document to have namespaces (default namespaces are easy to add). However, attributes are a slightly more complicated issue, since the default namespace does not apply to them. Therefore, I don't really have an elegant solution in mind at the moment. Further substantive comments: 10- why give separate definitions for the semantics of Core+XML and BLD+XML combinations? The semantics of RIF Core is the same as that of BLD; the only difference between the two dialects is the syntax. I would suggest to remove section 4.2 and say that the semantics in section 4.1 applies to both dialects. 11- as discussed (privately), all element information items in an instance of the data model are meant to be distinct. This must be mentioned in the definition. 12- Is there a difference between QName and expanded QName? If so, what is the difference? 13- section 3.2, 8. [typed value], first bullet: why do you deviate from the XQuery data model? 14- section 4: what is are XML instance and data documents, and what is the difference with XML documents? Both notions should be defined. 15- section 4: why limit yourself to combination with only one XML document? In fact, the Core syntax does not have this limitation, so it is unclear how 16- a RIF document is interpreted using a semantic multi-structure, not a semantic structure. This needs to be taken into account in the definitions in section 4. 17- notions of consistency and entailment, based on combined interpretations, need to be defined for RIF+XML combinations. Stating that these notions remain unchanged from Core does not work, since you do not have Core structures, but combined interpretations here. 18- section 4.1, 4th paragraph: constants are not "in" any lexical space. Constants have the form l^^s, where l is a string and s an IRI denoting a symbol space. 19- section 4.1.1, first bullet: the definition of string-matches is a bit hard to read and overly restrictive (e.g., it does not account for rdf:PlainLiterals without language tags). I would suggest to either match L_dt(c) (here, L_dt is the lexical-to-value mapping of the datatype of c) with [string value] or, better yet, just give a semantic definition: a string s string-matches i iff s=[string value] after white space normalization [of both s and [string value], I presume]. Similar for the second bullet. 20- definition in sec 4.1.1, 2.: the condition does not take frame formulas with multiple attributes, nor equality between IRIs into account. I would suggest to work on the semantic level, giving the definition in terms of domain elements and the I_frame mapping. Also, when speaking about domain values, you can speak directly of strings, rather than strings obtained from constants. Similar for bullet 3 and the corresponding bullets in the definition in sec 4.1.2. In addition, when using a semantic definition in sec 4.1.2, you no longer need to do type matching; all you need to do is require that the value on the RIF side is equal to [typed value], when discarding the type label. 21- section 4.1.3: what is the operational semantics of Core? It's not in the Core spec. 22- definition in section 4.1.2: the first condition in both 3a and 3b (the existence of a corresponding element in the XSD) seems redundant, since I_DM is based on a PSVI, and so must be schema-valid. Is that true? 23- definition in section 4.1.2: right now I cannot foresee the consequences of condition 4. It seems that including all possible XML datatypes is a problem, for example we already identified that the duration datatype poses a problem for RIF. The question is whether there are possible other datatypes that pose problems. Datatypes that are derived from types that are in RIF do not need to be included in DTS, since their value spaces are are necessarily subsets of D_Ind and there are syntactic representations of all the values. For this round of publication, I would suggest to add at least an editor's note saying that the condition will be further refined in future versions. Editorial comments: 101- Sec 3.1, 4th paragraph: references should be included that explain what general and external parsed entities are and how they are expanded 102- There is a definition of an "instance of the data model", but not of the data model. Given that there is no such definition, I think it unwise to speak about instances of it, since this only makes the spec harder to understand 103- Section 4, first paragraph: why introduce the additional term "interpretation" here? I would suggest to stick with the term "structure", as in the other RIF specs. 104- editor's note just above sec 4.1.1: yes, I think it should be said explicitly 105- definition in section 4.1.1: the notation {I_DM} is somewhat redundant with the requirement in the definition that all references in I_DM have been resolved Further questions: 1001- Is it true that it is guaranteed that every element and every attribute has a type in a PSVI infoset? In a schema it is possible to write such vague things as xs:any, thereby not actually specifying the type of a particular element. -- Jos de Bruijn Web: http://www.debruijn.net/ LinkedIn: http://at.linkedin.com/in/josdebruijn
Received on Tuesday, 15 June 2010 12:09:03 UTC