Re: [XML-Data] review of current draft from Christian De Sainte Marie on 2010-06-18 (public-rif-wg@w3.org from June 2010)

From: Christian De Sainte Marie <csma@fr.ibm.com>
Date: Fri, 18 Jun 2010 14:36:28 +0200
To: Jos de Bruijn <jos.debruijn@gmail.com>
Cc: RIF <public-rif-wg@w3.org>
Message-ID: <OF8F027AE8.E4A9D44C-ONC1257744.00324E5A-C1257746.0045426F@fr.ibm.com>
Hi Jos,

I implemented the changes as discussed. Here below are some additional 
comments and explanations, inlined.

Jos de Bruijn <jos.debruijn@gmail.com> wrote on 16/06/2010 10:04:55:
> >>
> >> 10- why give separate definitions for the semantics of Core+XML and
> >> BLD+XML combinations? 
> 
> BLD is a syntactic extension, but is semantically the same. So, right
> now there is a lot of duplication, in particular the definition of
> combined interpretation. It is not necessary to define the semantics
> twice, so don't do it.

Ok, that makes sense.

> > I am not convinced why we should do otherwise, but if there is 
overwhelming
> > support to rewrite everything the other way round, I will do it.
> 
> I'm not talking about rewriting anything, only about removing the 
duplication.

Well, that required some rewriting, still...

Anyway, it is done :-)

> >> 12- Is there a difference between QName and expanded QName? If so,
> >> what is the difference?

I included the definitions in the glossary.

> >> 13- section 3.2, 8. [typed value], first bullet: why do you deviate
> >> from the XQuery data model?
> >
> > Because we need a handle to the element information itself, when it is
> > object-like (that is, element-only children), so we can dig into it. 
And
> > XDM, in that case, defines the types value as being undefined, which 
is
> > useless in our case...
> 
> Ok. Perhaps add this explanation to the document.

Done.

> >> 14- section 4: what is are XML instance and data documents, and what
> >> is the difference with XML documents? Both notions should be defined.

I replaced, everywhere, "XML data document" and XML instance document" 
with "XML data" or "XML document" as appropriate (depending on whether it 
was about the content or the container).

> >> 15- section 4: why limit yourself to combination with only one XML
> >> document? In fact, the Core syntax does not have this limitation, so
> >> it is unclear how

I extended the definitions to cover multiple documents.

The case of multiple XML documents mixing some with XML schemas and some 
without has probably to be detailed more xplicitly: I did not do it in 
this version, but I added an editor's note.

> >> 16- a RIF document is interpreted using a semantic multi-structure,
> >> not a semantic structure. This needs to be taken into account in the
> >> definitions in section 4.
> 
> I guess it should be fine to say in an editor's note that the
> semantics is broadly in line with the Core semantics and that the
> definitions of satisfaction, consistency, and entailment will be
> included in the next version.

I added a section explaining how the combined semantics of a RIF BLD doc 
and XML data relates to the semantics of the RIF BLD doc alone 
(multi-structure, models, entailment etc).

Do we still need an editor's note? I did not add one, for now.

> >> 17- notions of consistency and entailment, based on combined
> >> interpretations, need to be defined for RIF+XML combinations. Stating
> >> that these notions remain unchanged from Core does not work, since 
you
> >> do not have Core structures, but combined interpretations here.
> >
> > [...]
> 
> As I mentioned above, including an editor's note along the lines as I
> mentioned should be fine for this publication.

Same response as above. And same question: do we, still, need an editor's 
note? I did not add one, for now.

> >> 18- section 4.1, 4th paragraph: constants are not "in" any lexical
> >> space. Constants have the form l^^s, where l is a string and s an IRI
> >> denoting a symbol space.

Yes, the literal of a constant is in the lexical space, right.

I think that I removed any reference to a lexical space altogether, though 
(except when excerpted from XDM). So, that should respond to your comment, 
anyway.

> >> 19- section 4.1.1, first bullet: the definition of string-matches is 
a
> >> bit hard to read and overly restrictive (e.g., it does not account 
for
> >> rdf:PlainLiterals without language tags). I would suggest to either
> >> match L_dt(c) (here, L_dt is the lexical-to-value mapping of the
> >> datatype of c) with [string value] or, better yet, just give a
> >> semantic definition: a string s string-matches i iff s=[string value]
> >> after white space normalization [of both s and [string value], I
> >> presume]. Similar for the second bullet.

I changed the first bullet slightly, to:
- a RIF constant, c, string-matches the [string value], s, of an 
information item, i, in an instance of the data model, if and only if c is 
a constant with type xs:string or a type derived from xs:string and c = s, 
after white spec normalization;

And I simplified the second bullet to:
- a RIF list, l, string-matches the [string value], s, of an information 
item, i, in an instance of the data model, if and only if s = L, after 
white space normalization, where L  is the order-preserving concatenation 
of the elements of l, after flattening l, and with a white space added 
between each element. 

Does that make the definition easier to read?

It excludes explicitly the case of rdf:plaintLiteral, though (the reason 
is that I cannot account for it without reading the rdf:plainLiteral spec 
first, which I did not do yet :-)

Is that a problem for the WD?

> >> 20- definition in sec 4.1.1, 2.: the condition does not take frame
> >> formulas with multiple attributes, nor equality between IRIs into
> >> account. I would suggest to work on the semantic level, giving the
> >> definition in terms of domain elements and the I_frame mapping. Also,
> >> when speaking about domain values, you can speak directly of strings,
> >> rather than strings obtained from constants. Similar for bullet 3 and
> >> the corresponding bullets in the definition in sec 4.1.2. In 
addition,
> >> when using a semantic definition in sec 4.1.2, you no longer need to
> >> do type matching; all you need to do is require that the value on the
> >> RIF side is equal to [typed value], when discarding the type label.
> >
> > [...]
>
> If you are having some trouble with the definition, I might be able to
> help. But I'm afraid I do not have time to work on it before June
> 22nd, so you could leave things as they are and include an editor's
> note saying that the definition will be changed along the lines of my
> comment for the next version of the document.

I used the I mapping for the comparison of the objects, attributes and 
classes, not frame values, at thi spoint.

I think that I see how I can do it, and I will propose the change if (i) I 
have the time to do it before June 22, and (ii) somebody has the time to 
check the correctness of the definition after I change it, and before the 
publication. Which is unlikely.

So, I propose that we go with the current definitions and an editor's 
note. I have to add the note, still.

> >> 21- section 4.1.3: what is the operational semantics of Core? It's 
not
> >> in the Core spec.

The semantics of RIF Core and XML data combinations is now defined wrt 
that of BLD and PRD combinations with XML data.

> >> 22- definition in section 4.1.2: the first condition in both 3a and 
3b
> >> (the existence of a corresponding element in the XSD) seems 
redundant,
> >> since I_DM is based on a PSVI, and so must be schema-valid. Is that
> >> true?
> >
> > The condition is needed to take substitution groups into account: you 
can
> > have a substitution group where the head never occurs in the XML data, 
but
> > the rule is written against the head element.

The substitution groups and type derivations that required the extra 
condition in the definition of RIF Core+schema valid XML data are, now, 
taken care of by the inheritance mechanism that is built in the semantics 
of subclasses. And, so, they have been removed.

> >> 23- definition in section 4.1.2: right now I cannot foresee the
> >> consequences of condition 4. It seems that including all possible XML
> >> datatypes is a problem, for example we already identified that the
> >> duration datatype poses a problem for RIF. The question is whether
> >> there are possible other datatypes that pose problems. Datatypes that
> >> are derived from types that are in RIF do not need to be included in
> >> DTS, since their value spaces are are necessarily subsets of D_Ind 
and
> >> there are syntactic representations of all the values.
> >> For this round of publication, I would suggest to add at least an
> >> editor's note saying that the condition will be further refined in
> >> future versions.
> >
> > Condition is not about including all possible XML datatypes, but the 
ones
> > that are used in the XML data doc or the associated XML schema.
> >
> > The datatypes that were problematic for DTB, were problematic because 
they
> > were not usually implemented, or consisted wit hthe one implemented, 
in most
> > or mainstream rule engines.
> 
> There is also the semantic problem of duration: the definition of the
> datatype makes things ambiguous, so you, in the end, do not know what
> the entailments are.
> 
> >
> > But if a data doc or a schema uses a datatype that your implementation 
does
> > not support, your in trouble if you want to use it anyway, so I donot 
think
> > this is a problem...
> >
> > Anyway, I certainly have nothing against an editor's note to call 
attention
> > to, and ask feedback on, possibly unforeseen consequences.
> 
> Ok, good.
> Then, you did not respond to the second part of my comment:
> 
> >> Datatypes that
> >> are derived from types that are in RIF do not need to be included in
> >> DTS, since their value spaces are are necessarily subsets of D_Ind 
and
> >> there are syntactic representations of all the values.

I will add an editor's note to cover those issues (and raise the issues).

> >> Editorial comments:
> >>
> >> 101- Sec 3.1, 4th paragraph: references should be included that
> >> explain what general and external parsed entities are and how they 
are
> >> expanded
 I still need to add most of the references need in the text. Will do that 
by Monday.

> >> 102- There is a definition of an "instance of the data model", but 
not
> >> of the data model. Given that there is no such definition, I think it
> >> unwise to speak about instances of it, since this only makes the spec
> >> harder to understand
> >
> > Hmmm, I thought that most of section 3 what about the definition of 
the data
> > model...
> 
> Aha! I did not realize that, because there is no definition of it. In
> fact, you don't refer to it other than in the phrase "instance of the
> data model". It is actually not clear how it is an instance (e.g., a
> sequence of attribute information items does not appear to be an
> instance); it is simply a sequence of element information items. So
> why mention "the data model" at all?
>
> > Sorry, I think that I do not understand your comment: can you 
reformulate
> > it, please? Or give an example where the use of "instance of the data 
model"
> > makes the spec harder to undertsnad?
> 
> In the phrase, "the data model" does not give any added information to
> the reader; it only serves to distract from the content.

On the other hand, if we just say "an instance", nobody will know an 
instance of what. Except if we define "instance" as meaning "instance of 
the data model", but, then, we are at our starting point again...

I do not think I understand the issue. We are describing a data model of 
XML documents, and we are, then, talking about instances of that data 
model: where is the problem?

Ok. I will add a paragraph in the introduction of section 3, to try to 
make it clear what is the data model that is described in that section...

> >> 103- Section 4, first paragraph: why introduce the additional term
> >> "interpretation" here? I would suggest to stick with the term
> >> "structure", as in the other RIF specs.

I remove the term "inerpretation", except where it is explicitly defined.

> >> 104- editor's note just above sec 4.1.1: yes, I think it should be
> >> said explicitly

I included the comment in the main text, and removed the EdNote.

> >> 105- definition in section 4.1.1: the notation {I_DM} is somewhat
> >> redundant with the requirement in the definition that all references
> >> in I_DM have been resolved

I changed the text, so that the requirement is only in one place in the 
definitions.

> >> Further questions:
> >>
> >> 1001- Is it true that it is guaranteed that every element and every
> >> attribute has a type in a PSVI infoset? In a schema it is possible to
> >> write such vague things as xs:any, thereby not actually specifying 
the
> >> type of a particular element.
> >
> > See http://www.w3.org/TR/xpath-datamodel/#PSVI2NodeTypes :-)

Yes, there are cases that the current version of the spec does not account 
for.

I will add an editor's note and raise the issue.

Cheers,

Christian

IBM
9 rue de Verdun
94253 - Gentilly cedex - FRANCE
Tel. +33 1 49 08 35 00
Fax +33 1 49 08 35 10


Sauf indication contraire ci-dessus:/ Unless stated otherwise above:
Compagnie IBM France
Siege Social : 17 avenue de l'Europe, 92275 Bois-Colombes Cedex
RCS Nanterre 552 118 465
Forme Sociale : S.A.S.
Capital Social : 611.451.766,20 ?
SIREN/SIRET : 552 118 465 03644
Received on Friday, 18 June 2010 12:37:08 UTC