AW: Canonical EXI - CR Review from Peintner, Daniel (ext) on 2016-04-26 (public-exi@w3.org from April 2016)

From: Peintner, Daniel (ext) <daniel.peintner.ext@siemens.com>
Date: Tue, 26 Apr 2016 12:41:10 +0000
To: Don Brutzman <brutzman@nps.edu>
CC: "public-exi@w3.org" <public-exi@w3.org>
Message-ID: <D94F68A44EB1954A91DE4AE9659C5A980FE916D0@DEFTHW99EH1MSX.ww902.siemens.net>
Hi Don,

Thank you very much for your thorough review.
Please find comments inline.

Thanks,

-- Daniel



> Thanks for the great work on the Canonical EXI Recommendation draft.
> Review follows.
>
> ==========================================================
> 1. General questions:
>
> a. What implementations exist?
>
> =============================

In general most of the existing EXI implementation produce already canonical EXI streams. That said, there are some cases where Canonical EXI limits the freedom of the EXI spec to guarantee the same octet stream.

> b. Are there any test results?  Is there a test corpus?
>
> =============================

There will be a test corpus. So far we just started the work.

> d. A diagram comparing Canonical EXI use to Canonical XML use, for XML
> Encryption, would perhaps be illuminating.
>
> Perhaps similar to Figure D-1. Canonical EXI used in Signature.
>
> Shouldn't the envelope be included in this diagram?

I agree that additional information and illustrations are helpful.
I will try to work on more diagrams for the non-normative sections.
Support their is always welcome!

> =============================
>
> e. Have we sought and received feedback comments from XML Security
> Working Group participants?
>
> [1]     XML Security Working Group
>          https://www.w3.org/2008/xmlsec
>
> =============================

We did receive feedback from people within the XML security working group that leaded to some proposal in Section "D.2 Exchange EXI Options".
More feedback would be appreciated though.

> f. What implementation and testing work has been done with respect to
> XML Signature and XML Encryption?
>
> Of related interest:
>
> [2]     Test cases for Canonical XML 2.0
>         W3C Working Group Note 18 June 2013
>          https://www.w3.org/TR/2013/NOTE-xml-c14n2-testcases-20130618
>
> =============================

XML Signature and XML Encryption are more related to the EXI spec as mentioned in appendix section "B.1 Relationship to XML Security".

> g. Has Canonical EXI been cross-checked with
>
>         XML Signature Syntax and Processing Version 2.0
>         W3C Working Group Note 23 July 2015
>          https://www.w3.org/TR/2015/NOTE-xmldsig-core2-20150723
>
> =============================

Canonical EXI bases on XML infoset to ensure interoperability.

> h. Avoiding date-time canonicalization is debatable, and could lead to difficulties.
>
> If a document wants a well-formatted date, a string could be possible.
>
> Wondering, doesn't this mean a Canonical EXI processor would need to support
> all possible variations of a date, including various internationalization
> (I18N) languages?  Hardly seems efficient.
>
> Comparison of two EXI streams with equivalent date values that are expressed
> in different forms would fail.  Hardly seems canonical.
>
> We should look at how XML Canonicalization, XML Signature and XML Encryption
> handle dates.  Perhaps XML Schema has a default form.
>
> I think we need date canonicalization.

There has been a long debate on the mailinglist on whether to canonicalize date-time or not. The outcome was summarized in appendix section "B.3 No Date-Time Canonicalization".

Already today XML signatures fail if the form of the string-based date-time differs!

> =============================
>
> i.  UnsignedInteger maximum value is problematic for floats and doubles, see below.
>
> =============================

According to my opinion EXI processors should not receive values that cannot be handled. Moreover, in case of EXI STRICT this situation exists already today.

Note: EXI Float/Double have a limited range (The range of the mantissa is - (26^3) to 26^3-1 and the range of the exponent is - (2^14-1) to 2^14-1)

> j. url data type normalization is needed, otherwise consistent/canonical compararison.
>
> This probably should go in new section 4.5.9.
>
> Suggested reference:
>
> [3]     Uniform Resource Identifier (URI): Generic Syntax
>         IETF RFC 3986
>          https://tools.ietf.org/html/rfc3986
>
> ==========================================================

The XML schema datatype anyURI maps to EXI string. Most of the EXI processors do not have any knowledge about the underlying XML schema datatype.
Introducing such requirements may prevent the support of Canonical EXI in current processors.

I could see that we add some information or best practices. Does this seem reasonable?

> ==========================================================
>
> 2. Editorial comments:
>
> =============================
> section 1.2, title:
>
>         1.2 Need of Canonical EXI
> to
>         1.2 Need for Canonical EXI
> or
> `       1.2 Motivation

Changed to "Motivation". Thanks!

> =============================
> section 1.2, first sentence:
>
>         W3C's Efficient XML Interchange Format
> to
>         W3C's Efficient XML Interchange (EXI) Format

Agree.

> =============================
> section 1.2, last paragraph first sentence:
>
>         "EXI canonicalization provides the first type-aware canonicalization scheme"
> to
>         "EXI canonicalization provides a type-aware canonicalization scheme"

Agree.

> =============================
> section 1.2, last sentence:
>
>         "can help cure some of the well-known XML security bottlenecks."
> to
>         "can help address some of the well-known processing bottlenecks for XML security."

Agree.

> Is there a reference for such bottleneck issues?
>
> An additional, separate motivating paragraph for XML Signature and XML Encryption would be useful here.

I did not find a good reference while it seems to be clear for many people that XML security causes processing bottlenecks.

> =============================
> 1.4 Limitations
>
>         "based on the knowledge of the used EXI options"
> to
>         "based on the applicable EXI options"

I believe that this modification would change the meaning.

> and
>         "Moreover, there is not one canonical EXI stream but many according to"
> to
>         "Moreover, there is not one canonical EXI stream but potentially many variants, according to"

Agree.

> and
>         "and the according EXI options and fidelity settings."
> to
>         "as well as the corresponding EXI options and fidelity settings."

Agree.

> =============================
> 3. Canonical EXI Header
>
> third paragraph
>
> append comma after
>         "A Canonical EXI Header MUST NOT begin with the optional EXI Cookie"

Agree.

> =============================
> numbered item 3:
>
>         "When the alignment option compression is set, pre-compress MUST
> be used instead of compression."
>
> Since this statement is counterintuitive and somewhat puzzling, adding a
> brief reason would be helpful to the reader.  Perhaps:
>
>         "This setting prevents further compression during processing of
> the Canonical EXI stream that might eliminate further information, as
> described in ____section___.

I would suggest adding the following note:
"EXI Compression uses the standard DEFLATE Compressed Data Format defined by RFC 1951 which does to define a canonical representation."

> =============================
> numbered item 4 Note paragraph:
>
>         "Nevertheless the burden of requiring the schemaId element has
> been found justifiable due to the increased security."
>
> append
>         " and strict representations of Canonical EXI"

I am not sure about the intent of the addition?

> =============================
> 4.1 EXI Alignment Options and Stream,
>
> second paragraph last sentence,
>
>         "using the alignment option pre-compression."
> append
>         "using the alignment option pre-compression instead."

Agree.

> =============================
> last sentence third paragraph, change
>
>         "In a Canonical EXI stream padding bits, if necessary, MUST always
> be represented as a sequence of 0 (zero) bits."
> to
>         "If used, the padding bits in a Canonical EXI stream MUST always
> be represented as a sequence of 0 (zero) bits."

Agree.

> =============================
> 4.2 EXI Event Selection
>
>         "followed by the according event content."
> to
>         "followed by the corresponding event content."

Agree.

> =============================
>         "the canonical EXI form prescribes which event and respectively
> which event code has to be chosen."
> to
>         "the canonical EXI form prescribes which event (and respectively
>  which event code) has to be chosen."

Agree.

> =============================
>         "that an EXI processors has"
> to
>         "that an EXI processor has"

Thanks!

> =============================
> 4.2.2 Use the event that matches most precisely
>
>         "non evolving"
> to
>         "non-evolving"

Agree.

> =============================
>         "MUST use the event that matches most precisely."
> to
>         "MUST use the event that matches the following prioritized
>  heuristics most precisely."

Agree.

 =============================

>         "IF the accurateness is the same use the event with the least
>  event code parts."
> to
>         "IF the representational accuracy is unaffected, then use the
> event with the least number event code parts."

Agree.

 =============================

>         "The verification solely bases on EXI grammars and EXI datatypes."
> to
>         "The verification is solely based on EXI grammars and EXI datatypes.

Agree.

> =============================
> 4.3.1 Exclude extraneous events
>
>         "The EXI grammars permit EXI processors to include extraneous CH ("") events"
> to
>         "The EXI grammars permit EXI processors to include extraneous empty-string CH ("") events"

Agree.

> =============================
> 4.3.3 Whitespace Handling
>
>         "One exception to this statement are whitespace characters."
> to
>         "One exception to this statement are significant whitespace characters."

Agree.

> =============================
>
>         "(i.e., all whitespaces are preserved)."
> to
>         "(i.e., all whitespace characters are preserved)."

Agree.

> =============================
>
>         "Not in all situations it is possible to respect whitespace handling rules."
> to
>         "It is not possible to respect whitespace-handling rules in all situations."

Agree.

> =============================
>
>         The value " 123 "
> append
>         For example, the value " 123 " (with a leading space character)

Agree.

> =============================
>
>         "Use-cases requiring whitespaces might considering to use
> Preserve.lexicalValues option set to true."
> to
>         "Use-cases requiring whitespace preservation might consider
>  using the Preserve.lexicalValues option set to true."

Agree.

> =============================
>
>         "When the current xml:space is not "preserve" we differ between
> simple data and complex data."
>
> Remove the "we" in this sentence.  Not really clear, please add an
>  explanatory sentence or express it more thoroughly.

I rephrased it to "one" and added links to the following subsections 4.3.3.1 and 4.3.3.2.

> =============================
> 4.3.3.1 Simple Whitespace Data
>
>         "When the grammar in effect is a schema-less grammar all whitespaces MUST be preserved."
> to
>         "When the grammar in effect is a schema-less grammar, then all whitespaces MUST be preserved."
>
> However I thought (and hope) that schema is now required?  So perhaps this sentence should be omitted.

Thanks!
W.r.t. "schema is required" the answer is no. Not all use-cases have schemas and we cannot require it.

> =============================
> 4.4 Stream Order
>
> third paragraph
>
>         "according to the NS prefix."
> to
>         "according to each NS prefix."

I am a bit reluctant here because I think it changes somewhat the meaning.

> =============================
> "Note:
>
> Optimizations such as pruning insignificant xsi:type values (e.g.,
> xsi:type="xsd:string" for string values) or insignificant xsi:nil values
> (e.g., xsi:nil="false") are prohibited for a Canonical EXI processor."
>
> not clear why this is the case, please explain

For example adding  xsi:nil="false" to many elements is possible. That said it does not change the document given that value is false. Optimized processors might remove such  xsi:nil="false" attributes to reduce the stream size. Canonical EXI forbids remving such attributes.

Do you have a proposal to make it clearer? Remove it? It is just a note and previous statements such as "SHALL NOT change the input sequence" say already the same.

> =============================
> 4.5.1 Unsigned Integer
>
> "Canonical EXI processors MUST use the Unsigned Integer datatype
> representation even if a value goes beyond the value 2147483647."
>
> What are expectations and requirements if this occurs?

The EXI specification allows you to fallback to string OR to still represent it as a sequence of octets terminated by an octet with its most significant bit set to 0.

Canonical EXI requires you to represent it as a sequence of octets.

> Respective handling (where to truncate) would seem to be different
> when representing the components of a float.
>
> This limits floating point numbers to ~9 significant digits of accuracy?
>
> What about doubles?

EXI float/doubles have already limited ranges in the EXI spec (see https://www.w3.org/TR/exi/#encodingFloat).

> Maximum/minimum expressable values need to be clearly listed for each derived type.
>
> Seems like a significant problem that needs to be addressed.  Perhaps
> consecutive Unsigned Integer values for higher resolution.

I am not sure about the proposal.

> =============================
> 4.5.3 Decimal
>
> add further requirements as bullets:
>
> - Omit leading zeroes in integral portion.
> - Omit trailing zeroes in fractional portion.

In typed encoding the above stated requirements happen already.
In the EXI Decimal representation there is no way to represent leading nor trailing zeroes.

> =============================
> 4.5.4 Float
>
>         "If A2 and B are equivalent per the rule 1 above, A and B are equivalent."
> to
>         "If A2 and B are equivalent values per the rule 1 above, A and B are equivalent."
>
> However I'm not convinced that the second rule shifting exponents for comparison
> is correct.  If leading and trailing zeroes are handled consistently, won't the
> mantissa values (all of the significant digits) always be the same?

We explored these rules and I think it should be OK. Do you have an example/illustration which causes issues...

> =============================
> 4.5.5 Date-Time
>
> Not liking this lack of date-time canonicalization, discussion near top of message.

see comments in the beginning..

> =============================
> 4.5.6 String and String Table
>
>         "length prefixed sequence of characters."
> to
>         "length-prefixed sequence of characters."

Agree.

Adopted from the EXI spec which seems to be inconsisten here as well. Sometimes "length prefixed" and sometimes "length prefixed".

> =============================
> 4.5.6 String and String Table
>
>         "EXI processors MUST first try to use the "local" compact
> identifier and only when this is not successful the global compact identifier."
> to
>         "EXI processors MUST first try to use the "local" compact identifier,
>  and only when this is not successful then try to use the global compact identifier."

Agree.

> =============================
>
>         "respect the XML schema whiteSpace facet, if available."
> to
>         "respect the XML schema whiteSpace facet, if defined."

Agree.

> =============================
> 4.5.7 Restricted Character Sets
>
>         "Restricted Character Sets in EXI enable to restrict the characters of the string datatype."
> to
>         "Restricted Character Sets are applied in EXI to restrict the characters of the string datatype."

Agree.

> =============================
>
>         "followed by the Unicode code point of the character"
> to
>         "followed by the Unicode code point for each character"

> =============================
> References
>
> Acronym for
>
>         Efficient XML Interchange (EXI) Format 1.0 (Second Edition)
> to
>         EXI

Changed to "EXI Format 1.0" because in some previous prose (e.g., Abstract) it is alwyas refered to as "format".

> =============================
>
> Similarly reduce acronyms for
>
>         Efficient XML Interchange (EXI) Impacts
>         Efficient XML Interchange (EXI) Best Practices
>         Efficient XML Interchange (EXI) Profile
> to
>         EXI Impacts
>         EXI Best Practices
>         EXI Profile

Done.

> =============================
>
> A number of reference entries are missing "W3C Recommendation", "W3C Working Group Note" etc.

Shall we add those information? I checked the EXI spec and we did not do so in the past.
see https://www.w3.org/TR/exi/#References

> =============================
>
> URL values do not need trailing slash character.
>
> =============================

I usually add "slash" for directories while I do not add any slash for files such as "rfc2119.txt".
That said, I think it does not really matter..

> CharModel to "Character Model"
>
> =============================
>
> CharModelNorm to "Character Model Identity" or somesuch

Agree.

> =============================
>
> Check commas in references, some missing.  Will send .pdf of my handwritten notes to assist.

Thanks!

> =============================
> B Design Decisions (Non-Normative)
>
>         "This section discusses a number of key decision points."
> to
>         "This section discusses a number of key decision points in the design of Canonical EXI."

Agree.

> =============================
> B.1 Relationship to XML Security
>
> "w.r.t." changed to "with respect to"

Agree.

> =============================
> capitalization
>
>         "When the XML canonicalization algorithm"
> to
>         "When the XML Canonicalization algorithm"

Agree.

> =============================
> then
>
>         "When the XML canonicalization algorithm preserves comments the
>  EXI fidelity option"
> to
>         "When the XML Canonicalization algorithm preserves comments in
>  a document, the EXI fidelity option"

Agree.

> =============================
> Unclear, more explanation is needed please:
>
> "Caution: The primary objective of Canonical EXI has been to eliminate
> the associated overhead of plain-text XML when building a canonical form.
> This means that in the case of signing Canonical XML, EXI can be used on
> intermediary nodes. On the contrary, it is not always possible to use XML
> on intermediary nodes when Canonical EXI has been used for signing."

I added an e.g., superfluous namespace declarations may be deleted as it is the case in Canonical XML.
EXI does not delete any namespace declaration!

> =============================
> B.2 No Unicode Normalization
>
>         "Furthermore, applications that must solve this problem typically
> enforce character model normalization"
> to
>         "Furthermore, applications that must solve this problem can
> typically enforce character model normalization

Agree.

> =============================
> append period
>
>         "must not change the code points"
> to
>         "must not change the code points."

Thanks.

> =============================
> B.3 No Date-Time Canonicalization
>
> No longer convinced this is the right answer, technical points above.

See comments above.

> =============================
> Example C-3. An algorithm for converting float values to the canonical form
>
>         "Examine the float value and extract the portion before and after the decimal point."
> to
>         "Examine the float value and extract the two portions before and after the decimal point."

Agree.

> =============================
> D.1 Signature Processing Steps
>
> Figure D-1. Canonical EXI used in Signature
>
> Need label on red arrow, is that bit-wise comparison?
>
> Should add (or refer to) steps in the comparison algorithm illustrated in this figure.

The figure is just meant so just sketch the process. It is not different from XML Signature.
I am not sure what we can add here.

> =============================
> D.1 Signature Processing Steps, last numbered list items
>
>         "1. What gets hashed and"
> to
>         "1. What gets hashed, and"

Agree.

> =============================
>
>         "2. How to exchange and share EXI options other than out-of-band or as part of the EXI stream"
> to
>         "2. How to exchange and share EXI options (other than out-of-band) as part of the EXI stream"

Agree.

> =============================
> D.2 Exchange EXI Options (Best Practices)
>
>         "so that for example it can be successfully"
> to
>         "so that for example such practices can be successfully"

Agree.

> =============================
>
> Wondering if redundant definition of EXI Option values in both signature
> and the document is yet another option?

I think this implicitly allowed. One can send an EXI stream with schema information but the signature is built without schema knowledge or other options.

> =============================
> D.2 Exchange EXI Options (Best Practices)
>
> (at end of document)
>
> Need to conclude this section somehow, the reader is left hanging.
>
> Perhaps D.2.4 Decision Criteria or somesuch.
>
> ==========================================================

Mhh, any good proposal?
What about the following:

"D.2.4 Decision Criteria

The previous subsections provide best practices how to exchange EXI options but use-cases are not limited to the afore mentioned proposals."
Received on Tuesday, 26 April 2016 12:42:04 UTC