- From: Alessandro Triglia <sandro@oss.com>
- Date: Tue, 22 Dec 2015 15:02:06 -0500
- To: <public-exi@w3.org>
Hi I, too, prefer approach B. When the EXI stream encoder receives an EE event from the user application, it can easily decide, just by looking at the current grammar, whether it is necessary to generate an extra event (a CH "") before the EE event. The extra event will be necessary in strict mode and will not be necessary in non-strict mode, and therefore the events generated in the two modes will be different. I don't see any problem with this. By the way, are you guys also considering the case of a user application actually sending a CH "" followed by an EE? I would think in this case the canonical EXI encoding should be the same as if the application had sent just an EE and no CH. This is because the underlying infoset in the two cases is the same. More broadly, I am also wondering what happens in the case of an application sending a series of CH events instead of a single larger one. For example, for an element <e>Jonathan</e>, the application could send SE CH "" CH "John" CH "athan" CH "" CH "" EE. This is regardless of the schema-informed or schemaless mode in use. From the infoset point of view, the character information item children of the element information item are just J o n a t h a n. So whether the application sends SE CH "" CH "John" CH "athan" EE or SE CH "Jonathan" EE, the *canonical* EXI encoding should be the same, no? It is unclear to me what is really meant by "canonical" in "Canonical" EXI. Is Canonical EXI intended to be "canonical" with respect to the infoset (meaning that given a particular infoset and a set of EXI options, the resulting EXI stream must be completely determined)? Is it intended to be "canonical" with respect to a source of input EXI events coming from an application (meaning that given a particular series of input EXI events and a set of EXI options, the resulting EXI stream must be completely determined)? The draft uses the phrase "logically equivalent within an application context", but I don't understand what that means. The EXI Recommendation says, "Each event in an EXI stream participates in a mapping system that relates events to XML Information Items so that an EXI document or an EXI fragment as a whole serves to represent an XML Information Set", but then there are many ways of representing a series of character information items as CH events, so for a given infoset one may end up having multiple Canonical EXI encodings if the canonicality is defined with respect to the input EXI events. Alessandro Triglia OSS Nokalva > -----Original Message----- > From: John Schneider [mailto:john.schneider@agiledelta.com] > Sent: Monday, December 21, 2015 17:59 > To: Takuki Kamiya <tkamiya@us.fujitsu.com> > Cc: public-exi@w3.org > Subject: Re: Call for opinions on how to represent empty elements in > Canonical EXI > > Note: Approach B also generates the same sequence of events for all > data types and does not require schema knowledge to work. This latter > characteristic reduces implementation complexity and yields faster > processing speeds. > > > On Dec 21, 2015, at 1:31 PM, Takuki Kamiya <tkamiya@us.fujitsu.com> > wrote: > > > > Hi, > > > > There are two approaches proposed on how to define rules regarding > > the encoding of empty elements in schema-informed context. > > > > Please provide any opinions as to which of those approaches you > > consider more appropriate to have as part of Canonical EXI. > > > > The behavior of each approach is described below. > > > > Approach A: This approach always first tries to encode empty > > elements (i.e. SE followed by EE, optionally AT, etc. in between) as > > a sequence of SE CH EE (optionally AT etc. between SE and CH) where > > CH is used for representing empty string, for elements defined to > > have simple-content, as long as doing so is possible (i.e. unless > > the codec in effect does *not* permit to encode empty string ""). > > > > Approach B: This approach encodes empty elements (i.e. SE followed > > by EE, optionally AT, etc. in between) as a sequence of SE EE > > (optionally AT > etc. > > in between). As an exception, for elements defined to have > > simple-content, it is allowed to insert CH that represents empty > > string "" between SE and EE only when doing so is necessary for > representing an empty element there. > > > > Note the approach B provides better efficiency, while approach B > > leads to generate the same sequence of events whether strict or > > non-strict > mode. > > > > Thank you, > > > > Takuki Kamiya > > Fujitsu Laboratories of America > > > > > > > > > > AgileDelta, Inc. > john.schneider@agiledelta.com > http://www.agiledelta.com > w: 425-644-7122 > m: 425-503-3403 > f: 425-644-7126 > > >
Received on Tuesday, 22 December 2015 20:02:34 UTC