Re: Canonical EXI - CR Review

Yes +1 thanks for the close scrutiny and great progress.  One further response.

On 5/18/2016 1:44 PM, John Schneider wrote:
> Daniel,
>
> Thank you very much for taking the time to review our comments. I’m glad they were helpful! I’ve answered the questions you posed and provided some additional comments in-line below.
>
> 	All the best!,
>
> 	John
> [...]
>
>>> 8. Section 3, contraint #4: This constraint should be removed so the
>>> spec. does not force users to include the schemaId in every EXI Options
>>> Document. This was discussed during the comment period for the Last Call
>>> Working Draft and there seemed to be general consensus applications should
>>> be able to choose whether to include the header options and schemaId in
>>> the Canonical EXI form [1][2].
>>
>>> Regarding the question posed in [1], I don’t believe we need a separate
>>> Canonical EXI Identifier to indicate the presence of the schemaId. The
>>> EXI Options document already has this capability, so the presence of the
>>> schemaId can be signaled in the same way other EXI options used for
>>> Canonicalization are signaled.
>>>
>>> [1] https://lists.w3.org/Archives/Public/public-exi-comments/2015Oct/0008.html
>>> [2] https://lists.w3.org/Archives/Public/public-exi-comments/2015Oct/0006.html
>>
>> We had discussions in the working group which concluded as follow:
>>
>> We will replace the statement "element schemaId MUST always be present" with "element schemaId SHOULD be present".
>> Further, we will add a warning note that speaks about why a schemaId should be present.
>
> Thank you for thinking about this issue. Its an important one. I appreciate your willingness to relax the MUST to a SHOULD. I think this gets us closer to what is needed. However, it still implies that it is somehow incorrect or undesirable to omit the schema ID from the header.
>
> As we discussed during the previous comment period, there are more secure, effective and efficient ways to ensure the same schema is used between signers/validators and encoders/decoders. These are needed and exist independently of Canonical EXI. If users have a more secure, effective and/or efficient way to ensure the same schema is used, I don’t think we should tell them they SHOULD use a less secure, less effective and/or less efficient way. Nor do I think we should imply their way is somehow incorrect or undesirable.
>
> For those that are already using a more secure, effective and/or efficient method, including the schemaId in the header duplicates existing functionality, adds overhead and provides no benefit. I don’t believe Canonical EXI should try to dictate one, potentially duplicative, less efficient, less effective and/or less secure method for identifying schemas. We should let users design their architecture to be as reliable, efficient and secure as needed, especially if they really need a method that is more secure, reliable or efficient than that provided by the schemaId.
>
> Note: As a related issue, we don’t believe we need an EXI Canonicalization Identifier to indicate whether the schemaId is present in the options document. The options document already has an explicit way to signal whether the schemaId is included — just like it has a way to signal all the other normal EXI options. We should treat the schemaId just like all the other options that can be expressed in the EXI options document.

Certainly agree that there are out-of-band communication alternatives to efficiently communicate schemaID information between sites as part of EXI data exchange.  It can be frustrating to say "here is my schemaID, here is my schemaID" over and over again.

However there is an important use case of concern: later retrieval of EXI-encoded documents by a third party who was not privy to separate schema handshaking.

For such an archival retrieval case, inability to decode an EXI document due to lack of schemaID information, and thus the necessary schema, is a complete failure.

Users of EXI would not have the expectation that a schema-informed grammar (the most efficient and perhaps most common variation) would produce compressed XML that is later unreadable by anyone not knowing the correct schema. Any as-yet-unknown future users of the EXI documents would effectively be given an encrypted document with no key.

Hopefully the SHOULD with clear guidance can prevent such scenarios.  I still have concern however that we are enabling a fatal shortcoming for archival documents that, in practice, can't be fixed once it is occurs.  We might well consider this possibility of failure to unacceptable.

Strong warnings in a Recommendation are useful, but they do not help a future user trying to decode an EXI document which (for whatever reason) failed to follow the warnings.

Anything else we might do to preclude such a failure is important to consider.
-  Perhaps recommended options for archival use?
- Perhaps facilitation of EXI compression of the schema itself, and optional inclusion within an EXI document?
- Perhaps a required warning by tools that perform such compression without a warning?

[...]
>>> 18. Section 4.5.5: The absence of any canonicalization of Date-Time types
>>> introduces a few places where implementations might produce different EXI
>>> outputs for the same Infoset, breaking signatures. We recommend the following
>>> canonicalization rules be established for Date-Time types:
>>>
>>> •The Hour value used to compute the Time component MUST NOT be 24.
>>> •The optional FractionalSecs component MUST be omitted if its value is zero.
>>> •If the utcTime EXI Canonicalization option is set to true, Date-Time values
>>> must be represented using Coordinated Universal Time (UTC, sometimes called "Greenwich Mean Time").
>>
>>> Note: in accordance with our discussion during Last Call [3], there are some
>>> use cases that will need to preserve timezones (especially those using
>>> Canonical EXI for transport) and some that will need to normalize timezones
>>> to UTC. We recommend a utcTime option be added to the Canonical EXI
>>> specification, so we can satisfy both sets of users.
>>>
>>> The utcTime option may be expressed with a new Canonical EXI Identifier:
>>> http://www.w3.org/TR/exi-c14n#utcTime.
>>>
>>> [3] https://lists.w3.org/Archives/Public/public-exi-comments/2015Sep/0001.html
>>> (discussion under comment #12)
>>
>> I agree with the canonicalization rules for hour and FractionalSecs.
>
> Great. Thanks!
>
>> W.r.t. the last rule I wonder whether we need an additional statement about whether the optional component TimeZone MUST be omitted or zero.
>
> Good question. If we did this, the signature validator would not be able to detect whether something had added, removed or modified TimeZones in a given document. Adding, removing or modifying TimeZones is a significant change the validator should be able to detect during signature validation. So, I don’t think EXI Canonicalization should normalize the existence of the TimeZone component.
>
>> Moreover, at TPAC we decided to go with 6 possible choices (see https://lists.w3.org/Archives/Public/public-exi-comments/2015Oct/0008.html) and found it too complicated. Even if we remove the option for schemaId we still have 4 possibilities.
>>
>> 1. default (with EXI options and no datetime normalization)
>> 2. without EXI options
>> 3. with utcTime normalization
>> 4. without EXI options AND  with utcTime normalization
>>
>> Do you or anyone else have a good proposal how we can best indicate those four options. Text versions like http://www.w3.org/TR/exi-c14n#WithoutEXIOptionsAndWithDatetimeNormalization get pretty verbose. Using numbers or characters such as http://www.w3.org/TR/exi-c14n#A do not convey any meaning.
>
> This is a another good question and an area where I think the spec. could use more definition. Right now, Appendix D.2 outlines a lot of options, but does not really provide a single, complete solution. We've spent some time thinking about this issue and have a proposal I'd like to share. Rather than clutter our feedback here, I will send a separate e-mail on this topic.

Simple response: we strictly follow XML Schema canonicalization rules for time values.  Adding any special semantics or additional alternatives whatsoever only leads to difficulty.

Rephrase:  XML Schema should govern whether 2 time values are equivalent, not EXI.  There are good words in there about presence of time zone and local application semantics.  There is also a strict canonical representation regarding dateTime values.
https://www.w3.org/TR/xmlschema11-2/#dateTime
https://www.w3.org/TR/xmlschema11-2/#vp-dateTimeCanRep

all the best, Don
-- 
Don Brutzman  Naval Postgraduate School, Code USW/Br       brutzman@nps.edu
Watkins 270,  MOVES Institute, Monterey CA 93943-5000 USA   +1.831.656.2149
X3D graphics, virtual worlds, navy robotics http://faculty.nps.edu/brutzman

Received on Thursday, 19 May 2016 14:57:23 UTC