comments on RIF+XML

I finally got some time to look at the document more closely.

Overall the document is moving along nicely, but some of its key aspects are
still poorly written and very hard to understand. It also seems to be spending
too much space on the infoset stuff, which is not really used that much.
Below please find a few detailed comments.


Sec 2, item 1:
    What is "the location subelement"? You have to describe things using
    the presentation language first.

    item 2: you are saying that this iri specifies the location of XML data
    without associated schema and then immediately talk about the schema
    for that document in item 3.
    Regarding item 3, how is the schema identified?

    This whole section is too wishy-washy.
    You should simply give the *exact* syntax of Import, define exactly
    the possible values of the arguments to this directive, and then explain
    the extension with respect to BLD.
    By the way, the location argument in Import cannot be
    missing. Otherwise, it is not compatible with BLD.

Sec 3.1, sentences 1-3 after the 1st definition: cannot understand these
sentences. Perhaps you should explain what you are trying to do here.

Sentence right before Def (Instance of the data model). This sentence is
completely redundant.

When talking about types, better use the term datatype, since this is how
they are called in XML. The term "type" feels as if it comes out of the

Sequence is defined after it is used (in the definition of instance of the
data model). It is better to incorporate the def of sequence into the
definition of instance, since this is such a trivial concept.

Expanded QName: is it really a set or a triple?

Section 3.2-3.5: is there a need to repeat these from the infoset spec?
I mean, how much of that detailed, lengthy, and boring stuff is really used?
It stands in the way of getting to the real stuff.
If at all, it should be in an appendix.

Definition (Associated XML schema): In import location CANNOT be
missing. If it is missing, it is not a BLD document.  You have to carefully
rethink the syntax and the meaning of the import directive.

Example 4.1
rule behave as -> rule behaves as
The ending box of the example is placed in a strange way.

Define or link "white space normalization."

Sec 4.1.2: what is "modulo a substitution"?

Def RIF BLD+schemaless:  I_DM is not well-defined. What does it mean "set
of element info items constructed from an infoset"?
Which infoset exactly? Does it depend on an XML document being imported?
A better way to do it is to require that I_DM is a set of all
*possible* element info items that are constructible out of the set of
unicode symbols. Something like that.
This will also simplify 4.1.3.

Item 2 in that definition: I_truth(I>o ...) = t. Seems like a spurious >.

Example 4.3: RIF non-normative presentation syntax. Delete "non-normative."
This is not factually correct and is not necessary here.

Ex 4.4:
... document, without a namespace nor an ... -> ... document without a namespace or an ...

schema valid XML -> schema-valid XML. Everywhere

Definition (RIF BLD+schema-valid)
I_DM must be as suggested for BLD+schemaless (or something similar, but
logically correct).

I find that much of the definitions of BLD+schemaless or BLD+schema-valid
is very hard to parse and is unacceptable. But items 2.c in both
definitions are completely out of hand. You have to find a way to
present these things so that mere mortals can parse and understand these
definitions without blowing up their mental stacks.
Maybe introduce some intermediate notation for matching and type-matching.
Maybe define val(e,[infosetproperty]) and use it.

Also, it is unclear to me why do you define the truth of o[slot->v] for lists
so that only a sublist of the children of e must match the list v?
In any case, this all has to be reworked and made more understandable.

Def BLD+schema-valid, item 4. DTS is extended to include all simple XML types.
Simple XML types include lists and unions, and these cannot be represented
using RIF data types in DTS.

Item 5.b is also out of hand.

Received on Monday, 19 July 2010 06:11:05 UTC