Re: EXI LC Comments from Daniel Peintner on 2009-02-26 (public-exi-comments@w3.org from February 2009)

From: Daniel Peintner <daniel.peintner@gmail.com>
Date: Thu, 26 Feb 2009 10:39:33 +0100
To: FABLET Youenn <Youenn.Fablet@crf.canon.fr>
Cc: public-exi-comments@w3.org, Efficient XML Interchange WG <member-exi-wg@w3.org>
Message-ID: <abbf52e10902260139w65d72f2t80698d7f8582e9bb@mail.gmail.com>
Hello Youenn,

thank you very much for your comment.

> 5) EXI schema-less/schema-informed modes
> Based on internal discussions and internal feedback, there is a
> general assumption that the EXI specification somehow defines
> two separate modes (schema-less and schema-informed).
>
> While this is clearly stated in the specification that both modes
> easily coexist in a single EXI stream, additional advertisement
> (maybe in the primer) of that feature may be good for adoption.
>
> The latest published primer (dec 2007) could maybe be improved with that respect.

You are right in assuming that schema-informed and built-in grammars
may coexist in the same EXI stream. The first published EXI primer
document [1] uses an incorrect terminology in that regard.
A revised version will be available soon and integrate your comments,
beside other improvements and spec consistency issues.

Thanks again for pointing us to the problem,

-- Daniel

[1] http://www.w3.org/TR/2007/WD-exi-primer-20071219/





On Thu, Nov 6, 2008 at 5:15 PM, FABLET Youenn
<Youenn.Fablet@crf.canon.fr> wrote:
> Dear EXI WG,
>
> please find below some comments and questions regarding EXI specification
> last call working draft.
>
> Regards,
>
>                 youenn fablet
>
>
>
> 1) Some facets are supported like minInclusive or maxExclusive.
>
> What about the support of the length, minLength and maxLength facets which
> could be useful to better encode string or list sizes.
>
> It should not be too difficult to support them based on current facet
> support.
>
> Is there a rationale to not include these facets?
>
>
>
> 2) Guidelines for schema modeling
>
> Is there any guideline regarding the relationship between EXI and schema
> modeling?
>
> Guidelines would be useful to understand the impact of some schema modeling
> decisions on EXI encoding/decoding in terms of efficiency and compression.
>
> For instance, it seems that the more global constructs (elements, types,
> attributes), the bigger will be the generated grammars since all global
> schema constructs need to be kept (right?),
>
> having a lot of xs:all or maxOccurs="999" may also hurt efficiency.
>
> See also question 3)
>
>
>
> 3) DataTypeRepresentationType question
>
> I would like a confirmation of the current DataTypeRepresentationType
> behaviour.
> Let's have a schema with the following attribute definition:
>
>                 <xs:attribute name="test" type="xs:string"/>
>
> In that case, the only way to change the encoding for @test1 values with the
> DataTypRepresentationType feature
>
> is to redefine xs:string which may have great impact.
>
> If we only want to change the @test values with the
> DataTypRepresentationType feature, we would need to
>
> change the schema as follow:
>
>                 <xs:simpleType name="mystring">
>
>                                  <xs:restriction base="xs:string"/>
>
>                 </xs:simpleType>
>
>                 <xs:attribute name="test" type="mystring"/>
>
> DataTypeRepresentationType could then be used to redefine mystring.
>
> Is it correct?
>
> If so, the interoperability will generally be lost, since interoperable
> DataTypeRepresentationType use is currently limited to XML Schema part 2
> predefined types redefinition (end of section 7.4).
>
> What about extending that behaviour to all simple types that have been
> gathered by consuming the schema in use?
>
> Is there any rationale behind that specific constraint?
>
>
>
> 4)  Typed encoding in schema-less mode
>
> EXI enables limited typed encoding support in schema-less encoding.
> Since only predefined types are supported, xsi:type seems mainly useful to
> encode base64 chunks with the binary encoding.
>
> Even in that case, the usability is not so good : in some  cases, elements
> whose content is base64 have also attributes. For instance ds:SignatureValue
> has an optional ID attribute.
>
> Of course, one could still use xsi:type=base64Binary in deviation mode but
> interoperability may be pretty bad and putting a wrong xsi:type for the
> purpose of compression seems broken.
>
> Also to be noted that:
>
>                 - Attribute values cannot be typed encoded with schema-less
> grammars.
>
>                 - Other useful types like "list of float","list of integers"
> cannot be used without external schema knowledge.
>
> Improved out-of-the-box support of this use case would be very helpful.
>
>
>
> 5) EXI schema-less/schema-informed modes
>
> Based on internal discussions and internal feedback, there is a general
> assumption that the EXI specification somehow defines two separate modes
> (schema-less and schema-informed).
>
> While this is clearly stated in the specification that both modes easily
> coexist in a single EXI stream,
>
> additional advertisement (maybe in the primer) of that feature may be good
> for adoption.
>
> The latest published primer (dec 2007) could maybe be improved with that
> respect.
>
>
>
> Additionaly, while EXI provides great flexibility in the amount of schema
> put in grammars,
>
> the schemaID mechanism seems very minimal.
>
> It seems that interoperable uses of schema-informed EXI will greatly
> restrain the use of this flexibility.
>
> Is there some additional work in that area that could or will be further
> conducted?
>
> 6) Is it conformant to not follow the attribute order in the case of a
> schema-informed grammar encoded element in deviation mode?
>
> As stated in  section 6, it seems not conformant.
>
> In some cases, grammars can support attributes in no particular order, such
> as the example below (correct me if I got something wrong).
>
> <xs:complexType name="test">
>
>                 <xs:attribute name="name" type="xs:string"/>
>
>                 <xs:anyAttribute namespace="#any"/>
>
> </xs:complexType>
>
> <xs:element name="test" type="test"/>
>
>
>
> While the benefit of ordering the attributes at the grammar level and the
> general compression benefit for encoders to follow the given order are
> obvious, I do not see compelling reasons of including this constraint in the
> format itself.
>
> At the encoder side, the encoder may decide to order attributes or not.
>
> If encoding fails due to bad ordering (in strict mode) or if the compression
> ratio is bad, the encoder can always decide to order the attributes.
> At the decoder side, the decoder is only following the grammars so it does
> not really care about the ordering.
>
> There is even a drawback as this is one (major ?) difference between
> schema-informed and schema-less processing.
>
> Am I missing something obvious?
>
>
>
> 7) RDF/XMP use case
>
> This is more a general comment on specific XML/EXI use cases, notably RDF or
> XMP documents where
>
> no standard, well defined XML schemas are available.
>
> These documents generally have some defined structures and types (RDF
> schema, XMP schemas…) but no
>
> well defined XML schemas.
>
> What would be the recommendation from the WG to enable good interoperable
> EXI compression? Stick with schema less encoding? Create a XML schema,
> publish it and use it?
>
>
>
> 8)  Through careful checking of published EXI encoded streams
>
> (Thanks again for the publication of these encoded examples by the way!),
>
> Herve found some potential differences between the streams and the
> specifications (see below).
>
>
>
> 9)
>
> Section 8.5.4.4.1:
>
>   When adding production:
>
>                                 AT (qname) [schema-invalid value] Element?,?
>
> to Elementi,j
>
> Which next Symbol should be used?
>
> Spec says Elementi,j
>
> It would be more logical to use the symbol from the production:
>
>                                 AT (qname) [schema-valid value] Elementi,k
>
>
>
> 10)
>
> Section 9.3
>
> "Value channels that contain no more than 100 values" seems to mean: with
> *strictly* less than 100 values.
>
> In this paragraph, all comparison should be made clearer using 'greater or
> equal' and 'strictly greater'.
>
>
>
> 11)
>
> Section 8.4.3
>
> In Schema-less mode, EE productions should be promoted to event code 0 when
> used (if no EE production with an event code length of 1 already exist).
>
>
>
> 12)
>
> Section 8.4.3
>
> In Schema-less mode, when using the SE(*) production, should the creation of
> the SE(qname) production be done before the evaluation of the element
> content?
>
>
>
> In most case, this has no impact. In case of recursive elements, this leads
> to better compaction.
>
> Moreover, in case or recursive elements, the current specification seems to
> imply creating several SE(qname) productions.
>
>
>
> 13)
>
> Section 8.4.3
>
> xsi:schemaLocation attributes seems to be removed from the infoset before
> encoding in agile delta streams.
>
> Is it by design or is it implementation related?
>
>
>
> 14)
>
> Section 7.3.3
>
> Empty strings can occur as attribute values.
>
> Section 7.3.3 suggests that these empty strings are to be added in indexing
> tables.
> The current litteral EXI encoding being compact enough, it is reasonnable
> not to add them in the table.
>
>
Received on Thursday, 26 February 2009 09:40:13 UTC