W3C home > Mailing lists > Public > public-exi@w3.org > April 2016

Re: Canonical EXI - CR Review

From: Don Brutzman <brutzman@nps.edu>
Date: Tue, 19 Apr 2016 09:21:42 -0700
To: "Peintner, Daniel (ext)" <daniel.peintner.ext@siemens.com>
CC: "public-exi@w3.org" <public-exi@w3.org>
Message-ID: <57165B16.6020209@nps.edu>
Thanks for the great work on the Canonical EXI Recommendation draft.  Review follows.

==========================================================
1. General questions:

a. What implementations exist?

=============================

b. Are there any test results?  Is there a test corpus?

=============================

d. A diagram comparing Canonical EXI use to Canonical XML use, for XML Encryption, would perhaps be illuminating.

Perhaps similar to Figure D-1. Canonical EXI used in Signature.

Shouldn't the envelope be included in this diagram?

=============================

e. Have we sought and received feedback comments from XML Security Working Group participants?

[1]	XML Security Working Group
	https://www.w3.org/2008/xmlsec

=============================

f. What implementation and testing work has been done with respect to XML Signature and XML Encryption?

Of related interest:

[2]	Test cases for Canonical XML 2.0
	W3C Working Group Note 18 June 2013
	https://www.w3.org/TR/2013/NOTE-xml-c14n2-testcases-20130618

=============================

g. Has Canonical EXI been cross-checked with

	XML Signature Syntax and Processing Version 2.0
	W3C Working Group Note 23 July 2015
	https://www.w3.org/TR/2015/NOTE-xmldsig-core2-20150723

=============================

h. Avoiding date-time canonicalization is debatable, and could lead to difficulties.

If a document wants a well-formatted date, a string could be possible.

Wondering, doesn't this mean a Canonical EXI processor would need to support all possible variations of a date, including various internationalization (I18N) languages?  Hardly seems efficient.

Comparison of two EXI streams with equivalent date values that are expressed in different forms would fail.  Hardly seems canonical.

We should look at how XML Canonicalization, XML Signature and XML Encryption handle dates.  Perhaps XML Schema has a default form.

I think we need date canonicalization.

=============================

i.  UnsignedInteger maximum value is problematic for floats and doubles, see below.

=============================

j. url data type normalization is needed, otherwise consistent/canonical compararison.

This probably should go in new section 4.5.9.

Suggested reference:

[3]	Uniform Resource Identifier (URI): Generic Syntax
	IETF RFC 3986
	https://tools.ietf.org/html/rfc3986

==========================================================
==========================================================

2. Editorial comments:

=============================
section 1.2, title:

	1.2 Need of Canonical EXI
to
	1.2 Need for Canonical EXI
or
`	1.2 Motivation

=============================
section 1.2, first sentence:

	W3C's Efficient XML Interchange Format
to
	W3C's Efficient XML Interchange (EXI) Format

=============================
section 1.2, last paragraph first sentence:

	"EXI canonicalization provides the first type-aware canonicalization scheme"
to
	"EXI canonicalization provides a type-aware canonicalization scheme"

=============================
section 1.2, last sentence:

	"can help cure some of the well-known XML security bottlenecks."
to
	"can help address some of the well-known processing bottlenecks for XML security."

Is there a reference for such bottleneck issues?

An additional, separate motivating paragraph for XML Signature and XML Encryption would be useful here.

=============================
1.4 Limitations

	"based on the knowledge of the used EXI options"
to
	"based on the applicable EXI options"

and
	"Moreover, there is not one canonical EXI stream but many according to"
to
	"Moreover, there is not one canonical EXI stream but potentially many variants, according to"

and
	"and the according EXI options and fidelity settings."
to
	"as well as the corresponding EXI options and fidelity settings."

=============================
3. Canonical EXI Header

third paragraph

append comma after
	"A Canonical EXI Header MUST NOT begin with the optional EXI Cookie"

=============================
numbered item 3:

	"When the alignment option compression is set, pre-compress MUST be used instead of compression."

Since this statement is counterintuitive and somewhat puzzling, adding a brief reason would be helpful to the reader.  Perhaps:

	"This setting prevents further compression during processing of the Canonical EXI stream that might eliminate further information, as described in ____section___.

=============================
numbered item 4 Note paragraph:

	"Nevertheless the burden of requiring the schemaId element has been found justifiable due to the increased security."

append
	" and strict representations of Canonical EXI"

=============================
4.1 EXI Alignment Options and Stream,

second paragraph last sentence,

	"using the alignment option pre-compression."
append
	"using the alignment option pre-compression instead."

=============================

last sentence third paragraph, change

	"In a Canonical EXI stream padding bits, if necessary, MUST always be represented as a sequence of 0 (zero) bits."
to
	"If used, the padding bits in a Canonical EXI stream MUST always be represented as a sequence of 0 (zero) bits."

=============================
4.2 EXI Event Selection

	"followed by the according event content."
to
	"followed by the corresponding event content."

=============================
	"the canonical EXI form prescribes which event and respectively which event code has to be chosen."
to
	"the canonical EXI form prescribes which event (and respectively which event code) has to be chosen."

=============================
	"that an EXI processors has"
to
	"that an EXI processor has"

=============================
4.2.2 Use the event that matches most precisely

	"non evolving"
to
	"non-evolving"

=============================
	"MUST use the event that matches most precisely."
to
	"MUST use the event that matches the following prioritized heuristics most precisely."

=============================

	"IF the accurateness is the same use the event with the least event code parts."
to
	"IF the representational accuracy is unaffected, then use the event with the least number event code parts."

=============================

	"The verification solely bases on EXI grammars and EXI datatypes."
to
	"The verification is solely based on EXI grammars and EXI datatypes.

=============================
4.3.1 Exclude extraneous events

	"The EXI grammars permit EXI processors to include extraneous CH ("") events"
to
	"The EXI grammars permit EXI processors to include extraneous empty-string CH ("") events"

=============================
4.3.3 Whitespace Handling

	"One exception to this statement are whitespace characters."
to
	"One exception to this statement are significant whitespace characters."

=============================

	"(i.e., all whitespaces are preserved)."
to
	"(i.e., all whitespace characters are preserved)."

=============================

	"Not in all situations it is possible to respect whitespace handling rules."
to
	"It is not possible to respect whitespace-handling rules in all situations."

=============================

	The value " 123 "
append
	For example, the value " 123 " (with a leading space character)

=============================

	"Use-cases requiring whitespaces might considering to use Preserve.lexicalValues option set to true."
to
	"Use-cases requiring whitespace preservation might consider using the Preserve.lexicalValues option set to true."

=============================

	"When the current xml:space is not "preserve" we differ between simple data and complex data."

Remove the "we" in this sentence.  Not really clear, please add an explanatory sentence or express it more thoroughly.

=============================
4.3.3.1 Simple Whitespace Data

	"When the grammar in effect is a schema-less grammar all whitespaces MUST be preserved."
to
	"When the grammar in effect is a schema-less grammar, then all whitespaces MUST be preserved."

However I thought (and hope) that schema is now required?  So perhaps this sentence should be omitted.

=============================
4.4 Stream Order

third paragraph

	"according to the NS prefix."
to
	"according to each NS prefix."

=============================
"Note:

Optimizations such as pruning insignificant xsi:type values (e.g., xsi:type="xsd:string" for string values) or insignificant xsi:nil values (e.g., xsi:nil="false") are prohibited for a Canonical EXI processor."

not clear why this is the case, please explain

=============================
4.5.1 Unsigned Integer

"Canonical EXI processors MUST use the Unsigned Integer datatype representation even if a value goes beyond the value 2147483647."

What are expectations and requirements if this occurs?

Respective handling (where to truncate) would seem to be different when representing the components of a float.

This limits floating point numbers to ~9 significant digits of accuracy?

What about doubles?

Maximum/minimum expressable values need to be clearly listed for each derived type.

Seems like a significant problem that needs to be addressed.  Perhaps consecutive Unsigned Integer values for higher resolution.

=============================
4.5.3 Decimal

add further requirements as bullets:

- Omit leading zeroes in integral portion.
- Omit trailing zeroes in fractional portion.

=============================
4.5.4 Float

	"If A2 and B are equivalent per the rule 1 above, A and B are equivalent."
to
	"If A2 and B are equivalent values per the rule 1 above, A and B are equivalent."

However I'm not convinced that the second rule shifting exponents for comparison is correct.  If leading and trailing zeroes are handled consistently, won't the mantissa values (all of the significant digits) always be the same?

=============================
4.5.5 Date-Time

Not liking this lack of date-time canonicalization, discussion near top of message.

=============================
4.5.6 String and String Table

	"length prefixed sequence of characters."
to
	"length-prefixed sequence of characters."

=============================
4.5.6 String and String Table

	"EXI processors MUST first try to use the "local" compact identifier and only when this is not successful the global compact identifier."
to
	"EXI processors MUST first try to use the "local" compact identifier, and only when this is not successful then try to use the global compact identifier."

=============================

	"respect the XML schema whiteSpace facet, if available."
to
	"respect the XML schema whiteSpace facet, if defined."

=============================
4.5.7 Restricted Character Sets

	"Restricted Character Sets in EXI enable to restrict the characters of the string datatype."
to
	"Restricted Character Sets are applied in EXI to restrict the characters of the string datatype."

=============================

	"followed by the Unicode code point of the character"
to
	"followed by the Unicode code point for each character"

=============================
References

Acronym for

	Efficient XML Interchange (EXI) Format 1.0 (Second Edition)
to
	EXI

=============================

Similarly reduce acronyms for

	Efficient XML Interchange (EXI) Impacts
	Efficient XML Interchange (EXI) Best Practices
	Efficient XML Interchange (EXI) Profile
to
	EXI Impacts
	EXI Best Practices
	EXI Profile

=============================

A number of reference entries are missing "W3C Recommendation", "W3C Working Group Note" etc.

=============================

URL values do not need trailing slash character.

=============================

CharModel to "Character Model"

=============================

CharModelNorm to "Character Model Identity" or somesuch

=============================

Check commas in references, some missing.  Will send .pdf of my handwritten notes to assist.

=============================
B Design Decisions (Non-Normative)

	"This section discusses a number of key decision points."
to
	"This section discusses a number of key decision points in the design of Canonical EXI."

=============================
B.1 Relationship to XML Security

"w.r.t." changed to "with respect to"

=============================
capitalization

	"When the XML canonicalization algorithm"
to
	"When the XML Canonicalization algorithm"

=============================
then

	"When the XML canonicalization algorithm preserves comments the EXI fidelity option"
to
	"When the XML Canonicalization algorithm preserves comments in a document, the EXI fidelity option"

=============================
Unclear, more explanation is needed please:

"Caution: The primary objective of Canonical EXI has been to eliminate the associated overhead of plain-text XML when building a canonical form. This means that in the case of signing Canonical XML, EXI can be used on intermediary nodes. On the contrary, it is not always possible to use XML on intermediary nodes when Canonical EXI has been used for signing."

=============================
B.2 No Unicode Normalization

	"Furthermore, applications that must solve this problem typically enforce character model normalization"
to
	"Furthermore, applications that must solve this problem can typically enforce character model normalization

=============================
append period

	"must not change the code points"
to
	"must not change the code points."

=============================
B.3 No Date-Time Canonicalization

No longer convinced this is the right answer, technical points above.

=============================
Example C-3. An algorithm for converting float values to the canonical form

	"Examine the float value and extract the portion before and after the decimal point."
to
	"Examine the float value and extract the two portions before and after the decimal point."

=============================
D.1 Signature Processing Steps

Figure D-1. Canonical EXI used in Signature

Need label on red arrow, is that bit-wise comparison?

Should add (or refer to) steps in the comparison algorithm illustrated in this figure.

=============================
D.1 Signature Processing Steps, last numbered list items

	"1. What gets hashed and"
to
	"1. What gets hashed, and"

=============================

	"2. How to exchange and share EXI options other than out-of-band or as part of the EXI stream"
to
	"2. How to exchange and share EXI options (other than out-of-band) as part of the EXI stream"

=============================
D.2 Exchange EXI Options (Best Practices)

	"so that for example it can be successfully"
to
	"so that for example such practices can be successfully"

=============================

Wondering if redundant definition of EXI Option values in both signature and the document is yet another option?

=============================
D.2 Exchange EXI Options (Best Practices)

(at end of document)

Need to conclude this section somehow, the reader is left hanging.

Perhaps D.2.4 Decision Criteria or somesuch.

==========================================================

On 3/30/2016 3:24 AM, Peintner, Daniel (ext) wrote:
> All,
>
> With the latest updates I believe we resolved all issues w.r.t. to Canonical EXI.
>
> Before moving to Candidate Recommendation (CR) I ask everyone to do a review of the document [1].
>
> A diff compared to the last call document can be found here [2].
>
> Thanks,
>
> -- Daniel
>
> [1] https://www.w3.org/XML/EXI/docs/canonical/canonical-exi.html
> [2] http://services.w3.org/htmldiff?doc1=http://www.w3.org/TR/exi-c14n/&doc2=https://www.w3.org/XML/EXI/docs/canonical/canonical-exi.html

Totally impressive work.  Again thanks for an important contribution and significant efforts!

all the best, Don
-- 
Don Brutzman  Naval Postgraduate School, Code USW/Br       brutzman@nps.edu
Watkins 270,  MOVES Institute, Monterey CA 93943-5000 USA   +1.831.656.2149
X3D graphics, virtual worlds, navy robotics http://faculty.nps.edu/brutzman
Received on Tuesday, 19 April 2016 16:22:17 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 19 April 2016 16:22:18 UTC