RE: EXI LC Comments (#14) : Adding empty strings to string table from John Schneider on 2009-03-19 (public-exi-comments@w3.org from March 2009)

From: John Schneider <john.schneider@agiledelta.com>
Date: Thu, 19 Mar 2009 13:22:42 -0700
To: "'FABLET Youenn'" <Youenn.Fablet@crf.canon.fr>, <public-exi-comments@w3.org>
Cc: "'????'" <fujisawa.jun@canon.co.jp>, "'RUELLAN Herve'" <Herve.Ruellan@crf.canon.fr>
Message-ID: <3BBC2F6B46104D2DB9682ECCCFC4620A@jcsdell8600>
Youenn,
 
Thank you again for your comments. I'm writing to respond to your comment
#14, which said:
 
> 14) Section 7.3.3

>  

> Empty strings can occur as attribute values. Section 7.3.3 suggests that
these empty strings are to be added in indexing tables.

> The current litteral EXI encoding being compact enough, it is reasonnable
not to add them in the table.

 

Yes, this is another very good point. Thanks again for your very thorough
review of the specification. The next version of the specification will be
updated to avoid adding the empty string to the string tables.

 
    Thanks again,
 
    John


  _____  

From: public-exi-comments-request@w3.org
[mailto:public-exi-comments-request@w3.org] On Behalf Of FABLET Youenn
Sent: Thursday, November 06, 2008 8:16 AM
To: public-exi-comments@w3.org
Cc: ????; RUELLAN Herve
Subject: EXI LC Comments



Dear EXI WG,

please find below some comments and questions regarding EXI specification
last call working draft.

Regards,

                youenn fablet

 

1) Some facets are supported like minInclusive or maxExclusive.

What about the support of the length, minLength and maxLength facets which
could be useful to better encode string or list sizes.

It should not be too difficult to support them based on current facet
support.

Is there a rationale to not include these facets? 

 

2) Guidelines for schema modeling

Is there any guideline regarding the relationship between EXI and schema
modeling?

Guidelines would be useful to understand the impact of some schema modeling
decisions on EXI encoding/decoding in terms of efficiency and compression.

For instance, it seems that the more global constructs (elements, types,
attributes), the bigger will be the generated grammars since all global
schema constructs need to be kept (right?),

having a lot of xs:all or maxOccurs="999" may also hurt efficiency.

See also question 3)

 

3) DataTypeRepresentationType question

I would like a confirmation of the current DataTypeRepresentationType
behaviour.
Let's have a schema with the following attribute definition:

                <xs:attribute name="test" type="xs:string"/>

In that case, the only way to change the encoding for @test1 values with the
DataTypRepresentationType feature

is to redefine xs:string which may have great impact.

If we only want to change the @test values with the
DataTypRepresentationType feature, we would need to

change the schema as follow:

                <xs:simpleType name="mystring">

                                 <xs:restriction base="xs:string"/>

                </xs:simpleType>

                <xs:attribute name="test" type="mystring"/>

DataTypeRepresentationType could then be used to redefine mystring.

Is it correct?

If so, the interoperability will generally be lost, since interoperable
DataTypeRepresentationType use is currently limited to XML Schema part 2
predefined types redefinition (end of section 7.4).

What about extending that behaviour to all simple types that have been
gathered by consuming the schema in use?

Is there any rationale behind that specific constraint?

 

4)  Typed encoding in schema-less mode

EXI enables limited typed encoding support in schema-less encoding.
Since only predefined types are supported, xsi:type seems mainly useful to
encode base64 chunks with the binary encoding.

Even in that case, the usability is not so good : in some  cases, elements
whose content is base64 have also attributes. For instance ds:SignatureValue
has an optional ID attribute.

Of course, one could still use xsi:type=base64Binary in deviation mode but
interoperability may be pretty bad and putting a wrong xsi:type for the
purpose of compression seems broken.

Also to be noted that:

                - Attribute values cannot be typed encoded with schema-less
grammars.

                - Other useful types like "list of float","list of integers"
cannot be used without external schema knowledge.

Improved out-of-the-box support of this use case would be very helpful.

 

5) EXI schema-less/schema-informed modes

Based on internal discussions and internal feedback, there is a general
assumption that the EXI specification somehow defines two separate modes
(schema-less and schema-informed).

While this is clearly stated in the specification that both modes easily
coexist in a single EXI stream, 

additional advertisement (maybe in the primer) of that feature may be good
for adoption. 

The latest published primer (dec 2007) could maybe be improved with that
respect.

 

Additionaly, while EXI provides great flexibility in the amount of schema
put in grammars, 

the schemaID mechanism seems very minimal.

It seems that interoperable uses of schema-informed EXI will greatly
restrain the use of this flexibility.

Is there some additional work in that area that could or will be further
conducted?



6) Is it conformant to not follow the attribute order in the case of a
schema-informed grammar encoded element in deviation mode?

As stated in  section 6, it seems not conformant.

In some cases, grammars can support attributes in no particular order, such
as the example below (correct me if I got something wrong).

<xs:complexType name="test">

                <xs:attribute name="name" type="xs:string"/>

                <xs:anyAttribute namespace="#any"/>

</xs:complexType>

<xs:element name="test" type="test"/>

 

While the benefit of ordering the attributes at the grammar level and the
general compression benefit for encoders to follow the given order are
obvious, I do not see compelling reasons of including this constraint in the
format itself. 

At the encoder side, the encoder may decide to order attributes or not. 

If encoding fails due to bad ordering (in strict mode) or if the compression
ratio is bad, the encoder can always decide to order the attributes.
At the decoder side, the decoder is only following the grammars so it does
not really care about the ordering.

There is even a drawback as this is one (major ?) difference between
schema-informed and schema-less processing.

Am I missing something obvious?

 

7) RDF/XMP use case

This is more a general comment on specific XML/EXI use cases, notably RDF or
XMP documents where

no standard, well defined XML schemas are available.

These documents generally have some defined structures and types (RDF
schema, XMP schemas.) but no 

well defined XML schemas.

What would be the recommendation from the WG to enable good interoperable
EXI compression? Stick with schema less encoding? Create a XML schema,
publish it and use it?

 

8)  Through careful checking of published EXI encoded streams

(Thanks again for the publication of these encoded examples by the way!),

Herve found some potential differences between the streams and the
specifications (see below). 

 

9)

Section 8.5.4.4.1:

  When adding production:

                                AT (qname) [schema-invalid value] Element?,?

to Elementi,j

Which next Symbol should be used?

Spec says Elementi,j

It would be more logical to use the symbol from the production:

                                AT (qname) [schema-valid value] Elementi,k

 

10)

Section 9.3

"Value channels that contain no more than 100 values" seems to mean: with
*strictly* less than 100 values.

In this paragraph, all comparison should be made clearer using 'greater or
equal' and 'strictly greater'.

 

11)

Section 8.4.3

In Schema-less mode, EE productions should be promoted to event code 0 when
used (if no EE production with an event code length of 1 already exist).

 

12)

Section 8.4.3

In Schema-less mode, when using the SE(*) production, should the creation of
the SE(qname) production be done before the evaluation of the element
content?

 

In most case, this has no impact. In case of recursive elements, this leads
to better compaction.

Moreover, in case or recursive elements, the current specification seems to
imply creating several SE(qname) productions.

 

13)

Section 8.4.3

xsi:schemaLocation attributes seems to be removed from the infoset before
encoding in agile delta streams.

Is it by design or is it implementation related?

 

14)

Section 7.3.3

Empty strings can occur as attribute values. 

Section 7.3.3 suggests that these empty strings are to be added in indexing
tables.
The current litteral EXI encoding being compact enough, it is reasonnable
not to add them in the table.
Received on Thursday, 19 March 2009 20:23:26 UTC