Re: Canonical EXI - CR Review

Daniel,

Thank you for thinking about this and breaking it down a little more. I’m glad to hear you believe the statement Taki recommended will address case 3. It seems to me this statement also applies to cases 1 and 2, so its not clear why we would force users to include the schemaId in the header those cases.

When the user has a separate, better way to ensure the signer and validator use the same schemas, why would we still force them to duplicate the information in the header? What is the benefit of forcing users to include schema information in the header when this duplicates functionality they already have via another mechanism?

 Thanks again,

 John 

> On May 27, 2016, at 2:51 AM, Peintner, Daniel (ext) <daniel.peintner.ext@siemens.com> wrote:
> 
> Hi John, Taki, Don, all,
> 
> I agree that describing the issue is a good solution and a way forward. That said, I would still like to see a strong statement in certain cases. Let me try to describe those use-cases.
> 
> The schemaId in EXI conveys 3 different sets of schema information
> 
> 1. schemaId containing the xsi:nil attribute value set to true
>   -->  schema-less EXI streams
> 
> 2. schemaId value is empty
>  --> no user defined schema information;
>      however, the built-in XML schema types are available
> 
> 3. All other situations
>  --> user defined schema information
> 
> I think we should require "schemaId MUST be present" for 1. and 2. while describing the objective for 3.
> 
> Does this make sense?
> 
> Thanks,
> 
> -- Daniel
> 
> 
> 
> 
> 
> ________________________________
> 
> Von: John Schneider [john.schneider@agiledelta.com]
> Gesendet: Freitag, 27. Mai 2016 00:06
> An: Takuki Kamiya
> Cc: Don Brutzman; Peintner, Daniel (ext) (CT RDA NEC EMB-DE); public-exi@w3.org
> Betreff: Re: Canonical EXI - CR Review
> 
> Taki,
> 
> That sounds like a very reasonable approach for addressing this issue. I really like the idea of spelling out the objective rather than dictating a solution. Thanks for thinking about this!
> 
>        All the best!,
> 
>        John
> 
>> On May 23, 2016, at 2:51 PM, Takuki Kamiya <tkamiya@us.fujitsu.com> wrote:
>> 
>> Hi,
>> 
>> One thing that I think we need to consider is why we might want to describe
>> the presence of schemaId in the header options as "SHOULD".
>> 
>> I think the reason we originally made schemaId mandatory was out of security
>> concerns. Do we still the same concern so that we are trying to make it
>> a "SHOULD"-have?
>> 
>> The fact that we already permit the omission of header options document
>> entirely, wouldn't that pose even a bigger security issues? Not labeling
>> this case as something "SHOULD NOT"-do seems inconsistent.
>> 
>> An alternative approach would be describing the caveat for not including
>> schemaId.
>> 
>> "Applications that use Canonical EXI need to ensure that the senders and
>> the receivers of EXI documents are using the same schema information.
>> This is regardless schemaId is included in the header options or not."
>> 
>> By mentioning that, we would not need to say schemaId is a "SHOULD"-have
>> because it is now a caveat for all use cases.
>> 
>> Takuki Kamiya
>> Fujitsu Laboratories of America
>> 
>> 
>> -----Original Message-----
>> From: John Schneider [mailto:john.schneider@agiledelta.com]
>> Sent: Thursday, May 19, 2016 3:46 PM
>> To: Don Brutzman
>> Cc: Peintner, Daniel (ext); public-exi@w3.org
>> Subject: Re: Canonical EXI - CR Review
>> 
>> Hi Don,
>> 
>> Its nice to hear from you! Thanks very much for the +1 and for taking the time to review our comments!
>> 
>> I agree there are some use cases, like archival use cases, where it is critical to include the schemaId in the EXI header. The Canonical EXI specification must support these use cases by allowing the schemaId in the header. If we were creating Canonical EXI just for these use cases, I might also agree the specification should say "the schemaId SHOULD [or even MUST] be included in the header." However, we're creating the Canonical EXI specification for a wide range of use cases, some of which require the schemaId and some of which are penalized by it. Therefore, I think we need to retain the same flexibility as EXI by providing the schemaId feature for those that need it, but not forcing it on those that don't.
>> 
>> I also agree it would be convenient if we could just reuse the canonical form of date times defined by XML Schema without thinking about it. Unfortunately, however, there is a subtle, but significant difference in the objective of XML Schema's canonical form and XML Signature's canonical form. The primary purpose of the XML Schema canonical form is to help determine whether two different dateTimes refer to the same instant. The primary purpose of the XML Signature canonical form is to determine whether any *significant* changes were made to the document after it was sent. And for some use cases, the timezone portion of the dateTime contains significant information, so a signature should be able to detect when this information has been changed. If we mandated the use of the XML Schema canonical form, the signature would not be able to detect changes to timezones and Canonical EXI would not work for these use cases.
>> 
>> All that said, there are also use cases that do not care about changes to timezone information. The proposed "utcTime" option supports these use cases by causing all times to be normalized to UTC. Note: when the value off the "utcTime" option is true, the EXI dateTime canonicalization matches the XML Schema dateTime canonicalization exactly.
>> 
>> I hope this helps to further explain the rational behind these designs. Thanks again for your support and comments!
>> 
>>       Cheers!,
>> 
>>       John
>> 
>>> On May 19, 2016, at 7:56 AM, Don Brutzman <brutzman@nps.edu> wrote:
>>> 
>>> Yes +1 thanks for the close scrutiny and great progress.  One further response.
>>> 
>>> On 5/18/2016 1:44 PM, John Schneider wrote:
>>>> Daniel,
>>>> 
>>>> Thank you very much for taking the time to review our comments. I'm glad they were helpful! I've answered the questions you posed and provided some additional comments in-line below.
>>>> 
>>>>     All the best!,
>>>> 
>>>>     John
>>>> [...]
>>>> 
>>>>>> 8. Section 3, contraint #4: This constraint should be removed so the
>>>>>> spec. does not force users to include the schemaId in every EXI Options
>>>>>> Document. This was discussed during the comment period for the Last Call
>>>>>> Working Draft and there seemed to be general consensus applications should
>>>>>> be able to choose whether to include the header options and schemaId in
>>>>>> the Canonical EXI form [1][2].
>>>>> 
>>>>>> Regarding the question posed in [1], I don't believe we need a separate
>>>>>> Canonical EXI Identifier to indicate the presence of the schemaId. The
>>>>>> EXI Options document already has this capability, so the presence of the
>>>>>> schemaId can be signaled in the same way other EXI options used for
>>>>>> Canonicalization are signaled.
>>>>>> 
>>>>>> [1] https://lists.w3.org/Archives/Public/public-exi-comments/2015Oct/0008.html
>>>>>> [2] https://lists.w3.org/Archives/Public/public-exi-comments/2015Oct/0006.html
>>>>> 
>>>>> We had discussions in the working group which concluded as follow:
>>>>> 
>>>>> We will replace the statement "element schemaId MUST always be present" with "element schemaId SHOULD be present".
>>>>> Further, we will add a warning note that speaks about why a schemaId should be present.
>>>> 
>>>> Thank you for thinking about this issue. Its an important one. I appreciate your willingness to relax the MUST to a SHOULD. I think this gets us closer to what is needed. However, it still implies that it is somehow incorrect or undesirable to omit the schema ID from the header.
>>>> 
>>>> As we discussed during the previous comment period, there are more secure, effective and efficient ways to ensure the same schema is used between signers/validators and encoders/decoders. These are needed and exist independently of Canonical EXI. If users have a more secure, effective and/or efficient way to ensure the same schema is used, I don't think we should tell them they SHOULD use a less secure, less effective and/or less efficient way. Nor do I think we should imply their way is somehow incorrect or undesirable.
>>>> 
>>>> For those that are already using a more secure, effective and/or efficient method, including the schemaId in the header duplicates existing functionality, adds overhead and provides no benefit. I don't believe Canonical EXI should try to dictate one, potentially duplicative, less efficient, less effective and/or less secure method for identifying schemas. We should let users design their architecture to be as reliable, efficient and secure as needed, especially if they really need a method that is more secure, reliable or efficient than that provided by the schemaId.
>>>> 
>>>> Note: As a related issue, we don't believe we need an EXI Canonicalization Identifier to indicate whether the schemaId is present in the options document. The options document already has an explicit way to signal whether the schemaId is included - just like it has a way to signal all the other normal EXI options. We should treat the schemaId just like all the other options that can be expressed in the EXI options document.
>>> 
>>> Certainly agree that there are out-of-band communication alternatives to efficiently communicate schemaID information between sites as part of EXI data exchange.  It can be frustrating to say "here is my schemaID, here is my schemaID" over and over again.
>>> 
>>> However there is an important use case of concern: later retrieval of EXI-encoded documents by a third party who was not privy to separate schema handshaking.
>>> 
>>> For such an archival retrieval case, inability to decode an EXI document due to lack of schemaID information, and thus the necessary schema, is a complete failure.
>>> 
>>> Users of EXI would not have the expectation that a schema-informed grammar (the most efficient and perhaps most common variation) would produce compressed XML that is later unreadable by anyone not knowing the correct schema. Any as-yet-unknown future users of the EXI documents would effectively be given an encrypted document with no key.
>>> 
>>> Hopefully the SHOULD with clear guidance can prevent such scenarios.  I still have concern however that we are enabling a fatal shortcoming for archival documents that, in practice, can't be fixed once it is occurs.  We might well consider this possibility of failure to unacceptable.
>>> 
>>> Strong warnings in a Recommendation are useful, but they do not help a future user trying to decode an EXI document which (for whatever reason) failed to follow the warnings.
>>> 
>>> Anything else we might do to preclude such a failure is important to consider.
>>> -  Perhaps recommended options for archival use?
>>> - Perhaps facilitation of EXI compression of the schema itself, and optional inclusion within an EXI document?
>>> - Perhaps a required warning by tools that perform such compression without a warning?
>>> 
>>> [...]
>>>>>> 18. Section 4.5.5: The absence of any canonicalization of Date-Time types
>>>>>> introduces a few places where implementations might produce different EXI
>>>>>> outputs for the same Infoset, breaking signatures. We recommend the following
>>>>>> canonicalization rules be established for Date-Time types:
>>>>>> 
>>>>>> *The Hour value used to compute the Time component MUST NOT be 24.
>>>>>> *The optional FractionalSecs component MUST be omitted if its value is zero.
>>>>>> *If the utcTime EXI Canonicalization option is set to true, Date-Time values
>>>>>> must be represented using Coordinated Universal Time (UTC, sometimes called "Greenwich Mean Time").
>>>>> 
>>>>>> Note: in accordance with our discussion during Last Call [3], there are some
>>>>>> use cases that will need to preserve timezones (especially those using
>>>>>> Canonical EXI for transport) and some that will need to normalize timezones
>>>>>> to UTC. We recommend a utcTime option be added to the Canonical EXI
>>>>>> specification, so we can satisfy both sets of users.
>>>>>> 
>>>>>> The utcTime option may be expressed with a new Canonical EXI Identifier:
>>>>>> http://www.w3.org/TR/exi-c14n#utcTime.
>>>>>> 
>>>>>> [3] https://lists.w3.org/Archives/Public/public-exi-comments/2015Sep/0001.html
>>>>>> (discussion under comment #12)
>>>>> 
>>>>> I agree with the canonicalization rules for hour and FractionalSecs.
>>>> 
>>>> Great. Thanks!
>>>> 
>>>>> W.r.t. the last rule I wonder whether we need an additional statement about whether the optional component TimeZone MUST be omitted or zero.
>>>> 
>>>> Good question. If we did this, the signature validator would not be able to detect whether something had added, removed or modified TimeZones in a given document. Adding, removing or modifying TimeZones is a significant change the validator should be able to detect during signature validation. So, I don't think EXI Canonicalization should normalize the existence of the TimeZone component.
>>>> 
>>>>> Moreover, at TPAC we decided to go with 6 possible choices (see https://lists.w3.org/Archives/Public/public-exi-comments/2015Oct/0008.html) and found it too complicated. Even if we remove the option for schemaId we still have 4 possibilities.
>>>>> 
>>>>> 1. default (with EXI options and no datetime normalization)
>>>>> 2. without EXI options
>>>>> 3. with utcTime normalization
>>>>> 4. without EXI options AND  with utcTime normalization
>>>>> 
>>>>> Do you or anyone else have a good proposal how we can best indicate those four options. Text versions like http://www.w3.org/TR/exi-c14n#WithoutEXIOptionsAndWithDatetimeNormalization get pretty verbose. Using numbers or characters such as http://www.w3.org/TR/exi-c14n#A do not convey any meaning.
>>>> 
>>>> This is a another good question and an area where I think the spec. could use more definition. Right now, Appendix D.2 outlines a lot of options, but does not really provide a single, complete solution. We've spent some time thinking about this issue and have a proposal I'd like to share. Rather than clutter our feedback here, I will send a separate e-mail on this topic.
>>> 
>>> Simple response: we strictly follow XML Schema canonicalization rules for time values.  Adding any special semantics or additional alternatives whatsoever only leads to difficulty.
>>> 
>>> Rephrase:  XML Schema should govern whether 2 time values are equivalent, not EXI.  There are good words in there about presence of time zone and local application semantics.  There is also a strict canonical representation regarding dateTime values.
>>> https://www.w3.org/TR/xmlschema11-2/#dateTime
>>> https://www.w3.org/TR/xmlschema11-2/#vp-dateTimeCanRep
>>> 
>>> all the best, Don
>>> --
>>> Don Brutzman  Naval Postgraduate School, Code USW/Br       brutzman@nps.edu
>>> Watkins 270,  MOVES Institute, Monterey CA 93943-5000 USA   +1.831.656.2149
>>> X3D graphics, virtual worlds, navy robotics http://faculty.nps.edu/brutzman
>> 
>> AgileDelta, Inc.
>> john.schneider@agiledelta.com
>> http://www.agiledelta.com<http://www.agiledelta.com/>
>> 
>> 
>> 
>> 
> 
> 
> 
> 
> 

Received on Wednesday, 1 June 2016 22:45:58 UTC