Re: Support for Canonical EXI interoperability test in TTFMS

At the end of the third paragraph below, I meant to say “I don’t believe those that use them should be penalized.” Amazing what the word “not” can do to the meaning of a sentence. :-)

> On Dec 4, 2015, at 4:35 PM, John Schneider <john.schneider@agiledelta.com> wrote:
> 
> Taki,
> 
> Yes, that right. An application will generally interpret the value of an empty element of type xs:string as an empty string. However, that does not mean XML or EXI must include an explicit representation for empty strings. Indeed, XML does not provide an explicit representation for empty strings. Nor do the commonly used XML parsing APIs. They generally represent an empty string implicitly as a Start Element followed directly by an End Element (e.g., SAX, StAX, XMLReader/XMLWriter) or as a node with zero children (in the case of the DOM). As such, XML applications are generally programmed to interpret the lack of simple element content as an empty value. 
> 
> So, I totally agree with you about how applications generally represent the value of empty elements of type xs:string. However, this is a separate concern from how empty strings should be represented in XML or EXI. XML developers are generally more familiar with the implicit representation of empty strings than an explicit one. 
> 
> Regarding compactness, there are some use cases that make heavy use of empty strings and some that don’t. It is incorrect to assume they are not used and therefore the compactness of their representation does not matter. I don’t believe those that use them should not be penalized. 
> 
> The simpler alternative I suggested provides the most compact representation for those that use empty strings and does not penalize those that do not. In addition, it improves processing efficiency and reduces implementation complexity for all users. I’m not aware of any use cases it penalizes. 
> 
>  Cheers,
> 
>  John   
> 
> 
>> On Dec 4, 2015, at 2:53 PM, Takuki Kamiya <tkamiya@us.fujitsu.com> wrote:
>> 
>> Hi,
>> 
>> When you have an element of type xs:string such as the following,
>> the element <foo> is associated with a semantics that it is a string
>> that is between the start and end tag.
>> 
>> <element name="foo" type="xs:string" />
>> 
>> According to that semantics, regardless of the representation differences
>> (i.e. <foo></foo> and <foo/>), its content needs to be interpreted as
>> an empty string "".
>> 
>> Therefore, the difference between having "" consistently (as Daniel phrased it)
>> vs. having "" only when necessary is a choice between making that always
>> explicit vs. mostly implicit.
>> 
>> However, in reality, I do not see tangible compactness difference because
>> I am not sure whether empty strings, empty binary values are often used
>> where it is typed with simple datatypes.
>> 
>> Thank you,
>> 
>> Takuki Kamiya
>> Fujitsu Laboratories of America
>> 
>> 
>> -----Original Message-----
>> From: Peintner, Daniel (ext) [mailto:daniel.peintner.ext@siemens.com]
>> Sent: Thursday, December 03, 2015 8:16 AM
>> To: John Schneider
>> Cc: Takuki Kamiya; public-exi@w3.org
>> Subject: AW: Support for Canonical EXI interoperability test in TTFMS
>> 
>> John,
>> 
>> Thank you very much for sharing your view and thoughts.
>> 
>> I like your proposal to add a new subsection in 4.2 (EXI Event Selection) to "4.2.x Exclude extraneous events".
>> 
>> Please see my other relevant comments below.
>> 
>>>> * my proposal tries to be consistent between strict and non-strict and
>>>> tries to achieve the schema designers intent
>>> 
>>> I'm not sure what this means. If my proposal does not achieve the schema
>>> designers intent, I would definitely be interested in understanding that.
>> 
>> Please let my try to explain why I believe that your proposal does not achieve the schema designer intent.
>> 
>> I think a proposal that matches the intent should come up with the same EXI events for a given set of XML information. That said, it should behave the same no matter whether strict is on or not. If one needs to differentiate we can hardly argue it is clear...
>> 
>> Let me give you a simple XML schema example that defines an element foo with type xs:string.
>> 
>> <element name"foo" type="xs:string" />
>> 
>> The intent is clearly to expect any character (xs:string) data.
>> 
>> That said, the event sequence
>> 
>> SE(foo) EE
>> 
>> does not correspond to the schemas intent while
>> 
>> SE(foo) CH("") EE
>> 
>> does match the schema constraints.
>> 
>> In my proposal it is always the same event sequence and that's what I mean by being "consistent".
>> 
>> Thanks,
>> 
>> -- Daniel
>> 
>> 
>> 
>> 
>> ________________________________
>> Von: John Schneider [john.schneider@agiledelta.com]
>> Gesendet: Freitag, 20. November 2015 02:58
>> An: Peintner, Daniel (ext)
>> Cc: Takuki Kamiya; public-exi@w3.org
>> Betreff: Re: Support for Canonical EXI interoperability test in TTFMS
>> 
>> Daniel,
>> 
>> You're welcome. Thank you for taking the time to review it and send your questions. Please see my comments in green below.
>> 
>> On Nov 19, 2015, at 12:14 AM, Peintner, Daniel (ext) <daniel.peintner.ext@siemens.com<mailto:daniel.peintner.ext@siemens.com>> wrote:
>> 
>> Hi John, all,
>> 
>> thank you very much for your proposal. I think it is clear and concise which is very good!
>> 
>> The proposal seems to assume certain aspects such as
>> * in strict mode an implementation needs to generate an empty CH if it wants to
>> 
>> When strict is on and the current grammar does not permit an EE event, EXI only provides one way to encode empty element content - i.e., CH("") followed by EE. Since EXI only provides one way to handle this situation, Canonical EXI does not need to say anything additional about it. All EXI processors must handle this sequence the same way (or fail to encode the document).
>> 
>> The general purpose of Canonical EXI is to identify areas where EXI permits alternate ways to represent the same thing and prescribe one of them. This is not one of those cases.
>> 
>> be able to generate an EXI stream
>> * the XML Infoset does not contain empty CH events
>> (or the processor needs to strip them?)
>> 
>> 
>> I think one of the reasons the descriptions were starting to get a little complicated is because you're trying to describe what an implementation must do to create a Canonical EXI stream rather than just describing the contents of a Canonical EXI stream. If you can unambiguously describe the required output, you can leave the implementation details up to the developers.
>> 
>> In this particular case, we don't need to describe every possible Infoset input and describe all the different things the EXI processor must do to create a valid Canonical EXI output from its input. We just need to describe the valid Canonical EXI output. The processor can determine whether they need to insert a CH("") or strip a CH("") to create this output.
>> 
>> 
>> Further, I am not sure about mixed content. Let's suppose having an element defined as follows
>> 
>> <xs:element name='el1'>
>> <xs:complexType mixed='true'>
>>  <xs:sequence>
>>    <xs:element name='t' type='xs:string' minOccurs='0'/>
>>  </xs:sequence>
>> </xs:complexType>
>> </xs:element>
>> 
>> When encoding <el1> there can be various valid sequences
>> 
>> a) SE(el1) CH("") SE(t) EE CH("") EE
>> b) SE(el1) SE(t) EE CH("") EE
>> c) SE(el1) CH("") SE(t) EE EE
>> d) SE(el1) SE(t) EE EE
>> ...
>> 
>> I don't think this is actually handled/defined in your proposal?
>> Am I correct?
>> 
>> This is a different issue that needs to be addressed separately. It is not actually specific to mixed content. When strict is false, the EXI grammar allows one to insert extraneous CH("") events almost anywhere. An EXI processor could even output a long sequence of consecutive CH("") events between start elements. Canonical EXI needs a rule that prohibits processors from including extraneous events that are not required to faithfully represent the XML document or conform with the EXI grammar.
>> 
>> Note that CH("") events are not part of the XML infoset. Infoset Character Information Items must contain characters. There is no Infoset Representation for the "lack of character data."  I'm not actually aware of any XML parsers that will generate extraneous CH("") events like the ones you illustrate above, nor to I see any reason why anyone would want to insert them. However, since the EXI grammar allows this option, Canonical EXI needs to restrict it.
>> 
>> I would recommend adding the following rule to section 4.2 (EXI Event Selection):
>> 
>> "4.2.x Exclude extraneous events
>> 
>> The EXI grammars permit EXI processors to include extraneous CH("") events that are not required by the grammar and do not change the XML Infoset. Canonical EXI MUST exclude extraneous CH("") events unless they are required by the EXI grammar (see section [reference the "empty element content" text I provided below])."
>> 
>> [Note: the only time the EXI grammars require an extraneous CH("") event is for encoding empty element content when strict is true.]
>> 
>> 
>> The two proposal we currently have are different in the following way
>> * your proposal tries to avoid any "additional" processing
>> 
>> It also improves compactness and strives to reduce implementation complexity.
>> 
>> * my proposal tries to be consistent between strict and non-strict and
>> tries to achieve the schema designers intent
>> 
>> I'm not sure what this means. If my proposal does not achieve the schema designers intent, I would definitely be interested in understanding that.
>> 
>> 
>> I am unsure which one is the way to go...
>> 
>> Thanks,
>> 
>> -- Daniel
>> 
>> 
>> 
>> 
>> 
>> ________________________________
>> Von: John Schneider [john.schneider@agiledelta.com<mailto:john.schneider@agiledelta.com>]
>> Gesendet: Mittwoch, 18. November 2015 21:26
>> An: Peintner, Daniel (ext)
>> Cc: Takuki Kamiya; public-exi@w3.org<mailto:public-exi@w3.org>
>> Betreff: Re: Support for Canonical EXI interoperability test in TTFMS
>> 
>> Daniel, all,
>> 
>> I've been following the discussion regarding Canonical EXI's treatment of empty elements and would like to offer a suggestion to simplify the wording and improve the efficiency of the proposed solution.
>> 
>> Here is what I would propose:
>> 
>> "When strict is false or the current element grammar contains a production of the form LeftHandSide : EE with event code of length 1, EXI can represent the content of an empty element explicitly as an empty CH event or implicitly as a SE event immediately followed by an EE event. In these circumstances, Canonical EXI MUST represent an empty element by a SE event followed by an EE event."
>> 
>> I think this description states the issue and the alternate solution simply and clearly. The alternate solution improves compactness by prescribing the most efficient representation of an empty character event when it is available (i.e., by omitting the CH event). It improves processing efficiency by requiring only 1-2 checks (strict & available EE) and does not require knowledge or checking against DTR types. These checks occur in a relatively hot code path, so minimizing overhead is important for efficiency. Because the alternate approach does not depend on DTR knowledge, it also avoids the need to describe how to handle user defined DTRs that can also encode empty strings (which the current proposal does not address).
>> 
>> I hope this is helpful. Please let me know if you have questions or if I've missed anything important.
>> 
>> Best wishes!,
>> 
>> John
>> 
>> On Nov 18, 2015, at 8:39 AM, Peintner, Daniel (ext) <daniel.peintner.ext@siemens.com<mailto:daniel.peintner.ext@siemens.com><mailto:daniel.peintner.ext@siemens.com>> wrote:
>> 
>> Hi Taki, all,
>> 
>> Thank you for your reply and your valuable comments.
>> 
>> I updated the proposal to incorporate your feedback. Also, the description now states the intent and lists again the rules.
>> 
>> 
>> --->
>> 
>> In general, Canonical EXI MUST NOT change the sequence of XML information items. However, the XML Infoset in some rare cases (e.g., due to API characteristics) may miss "Character Information items" such as strings with the number of characters equal to 0 (zero). EXI encoding may also fail without such an "empty" character information item (e.g., strict schema-informed streams that state the requirement of an expected character string - even if empty).
>> 
>> Hence, Canonical EXI aims for adding an "empty" character information item if the intent requires to do so (e.g., expected character string) and not for any other use case (e.g., mixed content).
>> 
>> That said, a canonical EXI processor MUST add a CH event with a String of length 0 (zero)
>> * if processing the current XML Information item fails by means of existing event codes
>> of length 1 (i.e., no EE or SE event exists), and
>> * when processing a schema-informed grammar where a CH event code of length 1 exists with
>> Built-in EXI Datatype Representation "Binary" (exi:base64Binary and exi:hexBinary),
>> "String", "List" or an Enumeration with an empty item.
>> 
>> In all other cases no further events MUST be added.
>> <---
>> 
>> What do you think?
>> Do you have any updates/proposals?
>> 
>> Thanks,
>> 
>> -- Daniel
>> 
>> 
>> 
>> ________________________________
>> Von: Takuki Kamiya [tkamiya@us.fujitsu.com<mailto:tkamiya@us.fujitsu.com><mailto:tkamiya@us.fujitsu.com>]
>> Gesendet: Mittwoch, 11. November 2015 22:40
>> An: Peintner, Daniel (ext); public-exi@w3.org<mailto:public-exi@w3.org><mailto:public-exi@w3.org>
>> Betreff: RE: Support for Canonical EXI interoperability test in TTFMS
>> 
>> Hi Daniel,
>> 
>> In schema-informed context, CH event-type with event-code length 1 comes from
>> two different schema constructs. One is from simple type content, the other is
>> from mixed-content.
>> 
>> For CH event types that came from mixed-content, there is no need for inserting
>> empty CH event. Therefore, I would suggest to exclude mixed-content CH event
>> types from the rule you described below.
>> 
>> You listed three EXI datatype representations (i.e. Binary, String and List) as
>> applicable to the described empty CH event insertion rule. I would like to point
>> out that enumerated values where one of the values is an empty string (i.e. "")
>> also should also apply. In other words, in all context where the EXI datatype
>> representation associated with the current CH event type allows for an empty CH,
>> empty CH event should be inserted.
>> 
>> Thanks,
>> 
>> taki
>> 
>> 
>> -----Original Message-----
>> From: Peintner, Daniel (ext) [mailto:daniel.peintner.ext@siemens.com]
>> Sent: Wednesday, November 11, 2015 5:08 AM
>> To: Takuki Kamiya; public-exi@w3.org<mailto:public-exi@w3.org><mailto:public-exi@w3.org>
>> Subject: AW: Support for Canonical EXI interoperability test in TTFMS
>> 
>> All,
>> 
>> According to yesterday's telecon I explored the empty CH("") event a bit further.
>> 
>> There are various situations when an empty CH could be added. One rather obvious case is a schema-informed stream that states the requirements of an expected character string (even if the string is empty). However,  also in schema-less mode one could assume that a previously "learned" CH event could mean that a CH is expected even if it is not there...
>> 
>> Summarizing I would like to propose the following requirement/addition to the Canonical EXI document.
>> 
>> --->
>> The XML Infoset in some rare cases (e.g., due to API characteristics) may miss "Character Information items" such as strings with the number of characters equal to 0 (zero). That said, EXI encoding may also fail without such an "empty" character information item. Hence, a canonical EXI processor MUST add a CH event with a String of length 0 (zero), if not already there, when beeing in a schema-informed grammar where a CH event code of length 1 exists with Built-in EXI Datatype Representation "Binary" (exi:base64Binary and exi:hexBinary), "String" or "List". The availability of such a CH event in the grammar clearly states the intent, in this case the requirement of empty characters. In all other cases no further events MUST be added.
>> <---
>> 
>> What do people think?
>> 
>> Thanks,
>> 
>> -- Daniel
>> 
>> 
>> 
>> 
>> 
>> ________________________________
>> Von: Peintner, Daniel (ext) [daniel.peintner.ext@siemens.com<mailto:daniel.peintner.ext@siemens.com><mailto:daniel.peintner.ext@siemens.com>]
>> Gesendet: Montag, 9. November 2015 17:06
>> An: Takuki Kamiya; public-exi@w3.org<mailto:public-exi@w3.org><mailto:public-exi@w3.org>
>> Betreff: AW: Support for Canonical EXI interoperability test in TTFMS
>> 
>> Taki, all,
>> 
>> we looked into the issue more closely and found the following issues.
>> 
>> 1. How to deal with conflicting framework options
>> 
>> The framework (or the associated test cases) may define conflicting parameters (e.g, preserve processing instructions and strict). In such a situation an EXI processor may decide whether to use non-strict encoding to support processing instructions or to eliminate PI support.
>> 
>> As it turns out the EXI processors (OpenEXI and EXIficient) tend to use different strategies. That said, both strategies are OK. Hence, I think we need to make the framework aware of such a situation so that the framework decides what is the desired result.
>> 
>> 
>> 2. Empty CH("") events
>> 
>> An XML schema may define an element as follows
>> <xs:element name="foo" type="xs:string"/>
>> 
>> A valid instance may look as follows.
>> 
>> <foo></foo>
>> 
>> Depending on the EXI options and the mode (strict vs. non-strict) the following two EXI streams are possible
>> 
>> SE(foo) EE(foo)                --> applicable in non-strict only
>> SE(foo) CH("") EE(foo)      --> applicable in strict and non-strict
>> 
>> Again, we need to ensure all Canonical EXI processors behave the same.
>> Hence, I would argue for the latter case given that it is usable in both (strict and non-strict) scenario but I am open for other ideas/thoughts.
>> 
>> 3. Whitespace handling
>> 
>> I wonder whether we need to define whitespace preservation rules in Canonical EXI similar to the TTFMS framework rules.
>> 
>> 
>> Thanks,
>> 
>> -- Daniel
>> 
>> 
>> 
>> 
>> 
>> ________________________________
>> Von: Takuki Kamiya [tkamiya@us.fujitsu.com<mailto:tkamiya@us.fujitsu.com><mailto:tkamiya@us.fujitsu.com>]
>> Gesendet: Mittwoch, 21. Oktober 2015 00:05
>> An: Peintner, Daniel (ext); public-exi@w3.org<mailto:public-exi@w3.org><mailto:public-exi@w3.org>
>> Betreff: RE: Support for Canonical EXI interoperability test in TTFMS
>> 
>> Hi Daniel,
>> 
>> I fixed a bug in the TTFMS framework.
>> 
>> Next time you compile the framework and run the test,
>> you will be able to see schema-informed EXI files generated
>> when the test case provides one and schema use is enabled.
>> 
>> Thank you,
>> 
>> Takuki Kamiya
>> Fujitsu Laboratories of America
>> 
>> 
>> -----Original Message-----
>> From: Peintner, Daniel (ext) [mailto:daniel.peintner.ext@siemens.com]
>> Sent: Wednesday, October 14, 2015 6:50 AM
>> To: Takuki Kamiya; public-exi@w3.org<mailto:public-exi@w3.org><mailto:public-exi@w3.org>
>> Subject: AW: Support for Canonical EXI interoperability test in TTFMS
>> 
>> Hi Taki,
>> 
>> I uploaded a revised EXIficient library but I agree, I do still see some issues.
>> (in my test run 20 files out of 115 are still different)
>> 
>> Maybe this has to do with whitespace handling (will send separate email...)
>> 
>> Moreover, I am currently able to run schema-less test runs only by calling
>> ant run-iot-c14n-classes -DtestCases=config/testCases-restricted/all-v1.xml
>> 
>> Maybe someone can point me to the configuration how to call schema-informed test runs or byteAligned test runs to facilitate debugging.
>> 
>> Thanks,
>> 
>> -- Daniel
>> 
>> 
>> 
>> ________________________________
>> Von: Takuki Kamiya [tkamiya@us.fujitsu.com<mailto:tkamiya@us.fujitsu.com><mailto:tkamiya@us.fujitsu.com>]
>> Gesendet: Dienstag, 13. Oktober 2015 03:23
>> An: Peintner, Daniel (ext); public-exi@w3.org<mailto:public-exi@w3.org><mailto:public-exi@w3.org>
>> Betreff: RE: Support for Canonical EXI interoperability test in TTFMS
>> 
>> Hi Daniel,
>> 
>> I also modified openexi driver so that it always output header options.
>> 
>> However, I still see many differences between exificient and openexi
>> outputs. We will need to further investigate this.
>> 
>> Thank you,
>> 
>> Takuki Kamiya
>> Fujitsu Laboratories of America
>> 
>> 
>> -----Original Message-----
>> From: Peintner, Daniel (ext) [mailto:daniel.peintner.ext@siemens.com]
>> Sent: Thursday, October 01, 2015 5:50 AM
>> To: Takuki Kamiya; public-exi@w3.org<mailto:public-exi@w3.org><mailto:public-exi@w3.org>
>> Subject: AW: Support for Canonical EXI interoperability test in TTFMS
>> 
>> Hi Taki,
>> 
>> Thank you for pointing me to the parameter "measure" which indicates the type of the test run.
>> 
>> I also uploaded a first snapshot of the EXIficient library supporting Canonical EXI. Additional updates may be necessary.
>> When comparing the encoded files with OpenEXI I do see mostly diffs. I think it is because OpenEXI at the moment does not always include the EXI Options.
>> 
>> Please let me know if you encounter other issues.
>> 
>> Thanks,
>> 
>> -- Daniel
>> 
>> ________________________________
>> Von: Takuki Kamiya [tkamiya@us.fujitsu.com<mailto:tkamiya@us.fujitsu.com><mailto:tkamiya@us.fujitsu.com>]
>> Gesendet: Donnerstag, 1. Oktober 2015 01:45
>> An: Peintner, Daniel (ext); public-exi@w3.org<mailto:public-exi@w3.org><mailto:public-exi@w3.org>
>> Betreff: RE: Support for Canonical EXI interoperability test in TTFMS
>> 
>> Hi Daniel,
>> 
>> You should be able to get the test mode by accessing:
>> measure field (of class MeasureParam) that is in _driverParams (of class DriverParameters)
>> 
>> When it is iot_c14n_encode, you should change the behavior of the
>> processor to comply with c14n rules.
>> 
>> Do you plan to check-in new EXIficient jar to TTFMS soon?
>> 
>> Thank you,
>> 
>> Takuki Kamiya
>> Fujitsu Laboratories of America
>> 
>> 
>> -----Original Message-----
>> From: Peintner, Daniel (ext) [mailto:daniel.peintner.ext@siemens.com]
>> Sent: Wednesday, September 30, 2015 8:28 AM
>> To: Takuki Kamiya; public-exi@w3.org<mailto:public-exi@w3.org><mailto:public-exi@w3.org>
>> Subject: AW: Support for Canonical EXI interoperability test in TTFMS
>> 
>> Hi Taki,
>> 
>> I did check out the new code and it worked as expected.
>> Thank you for your work!
>> 
>> The only thing I miss is a testCase option that informs about whether the EXI processor is required to produce canonical EXI.
>> 
>> Did I miss anything with that regard?
>> 
>> Thanks,
>> 
>> -- Daniel
>> 
>> 
>> 
>> P.S. EXIficient does not sort attributes in schema-less mode
>> 
>> 
>> ________________________________
>> Von: Takuki Kamiya [tkamiya@us.fujitsu.com<mailto:tkamiya@us.fujitsu.com><mailto:tkamiya@us.fujitsu.com>]
>> Gesendet: Dienstag, 29. September 2015 02:20
>> An: public-exi@w3.org<mailto:public-exi@w3.org><mailto:public-exi@w3.org>
>> Betreff: Support for Canonical EXI interoperability test in TTFMS
>> 
>> Hi,
>> 
>> I added support for Canonical EXI interoperability test in TTFMS.
>> 
>> You need to invoke target " run-iot-c14n-classes" in order to run the
>> encoding process.
>> 
>> After that, diff tools such as WinMerge (on windows) can be used to
>> compare the encoded files output by various implementations.
>> 
>> Initial experimental run showed quite a lot of differences in encodings
>> between EXIficient and OpenEXI.
>> 
>> I found at least some of the diffs are due to the attribute orders in
>> schema-less setting. Is it true that EXIficient sorts attributes whether
>> it is schema-less or schema-informed?
>> 
>> Thank you,
>> 
>> Takuki Kamiya
>> Fujitsu Laboratories of America
>> 
>> 
>> 
>> 
>> 
>> AgileDelta, Inc.
>> john.schneider@agiledelta.com<mailto:john.schneider@agiledelta.com><mailto:john.schneider@agiledelta.com>
>> http://www.agiledelta.com<http://www.agiledelta.com/><http://www.agiledelta.com%3chttp//www.agiledelta.com/%3E>
>> w: 425-644-7122
>> m: 425-503-3403
>> f: 425-644-7126
>> 
>> 
>> 
>> 
>> AgileDelta, Inc.
>> john.schneider@agiledelta.com<mailto:john.schneider@agiledelta.com>
>> http://www.agiledelta.com<http://www.agiledelta.com/>
>> w: 425-644-7122
>> m: 425-503-3403
>> f: 425-644-7126
>> 
>> 
>> 
> 
> AgileDelta, Inc.
> john.schneider@agiledelta.com
> http://www.agiledelta.com
> w: 425-644-7122
> m: 425-503-3403
> f: 425-644-7126
> 
> 
> 

AgileDelta, Inc.
john.schneider@agiledelta.com
http://www.agiledelta.com
w: 425-644-7122
m: 425-503-3403
f: 425-644-7126

Received on Saturday, 5 December 2015 05:40:52 UTC