Re: Call for opinions on how to represent empty elements in Canonical EXI from Alessandro Triglia on 2015-12-22 (public-exi@w3.org from December 2015)

From: Alessandro Triglia <sandro@oss.com>
Date: Tue, 22 Dec 2015 15:02:06 -0500
To: <public-exi@w3.org>
Message-ID: <032b01d13cf3$a25b0a90$e7111fb0$@oss.com>
Hi

I, too, prefer approach B.  When the EXI stream encoder receives an EE event
from the user application, it can easily decide, just by looking at the
current grammar, whether it is necessary to generate an extra event (a CH
"") before the EE event.  The extra event will be necessary in strict mode
and will not be necessary in non-strict mode, and therefore the events
generated in the two modes will be different. I don't see any problem with
this.

By the way, are you guys also considering the case of a user application
actually sending a CH "" followed by an EE?  I would think in this case the
canonical EXI encoding should be the same as if the application had sent
just an EE and no CH.  This is because the underlying infoset in the two
cases is the same.

More broadly, I am also wondering what happens in the case of an application
sending a series of CH events instead of a single larger one.  For example,
for an element <e>Jonathan</e>, the application could send SE CH "" CH
"John" CH "athan" CH "" CH "" EE.  This is regardless of the schema-informed
or schemaless mode in use.  From the infoset point of view, the character
information item children of the element information item are  just   J   o
n   a   t   h   a   n.  So whether the application sends     SE CH "" CH
"John" CH "athan" EE     or     SE CH "Jonathan" EE,     the *canonical* EXI
encoding should be the same, no?  It is unclear to me what is really meant
by "canonical" in "Canonical" EXI.  Is Canonical EXI intended to be
"canonical" with respect to the infoset (meaning that given a particular
infoset and a set of EXI options, the resulting EXI stream must be
completely determined)? Is it intended to be "canonical" with respect to a
source of input EXI events coming from an application (meaning that given a
particular series of input EXI events and a set of EXI options, the
resulting EXI stream must be completely determined)?  The draft uses the
phrase "logically equivalent within an application context", but I don't
understand what that means. The EXI Recommendation says, "Each event in an
EXI stream participates in a mapping system that relates events to XML
Information Items so that an EXI document or an EXI fragment as a whole
serves to represent an XML Information Set", but then there are many ways of
representing a series of character information items as CH events, so for a
given infoset one may end up having multiple Canonical EXI encodings if the
canonicality is defined with respect to the input EXI events.

Alessandro Triglia
OSS Nokalva



> -----Original Message-----
> From: John Schneider [mailto:john.schneider@agiledelta.com]
> Sent: Monday, December 21, 2015 17:59
> To: Takuki Kamiya <tkamiya@us.fujitsu.com>
> Cc: public-exi@w3.org
> Subject: Re: Call for opinions on how to represent empty elements in 
> Canonical EXI
> 
> Note: Approach B also generates the same sequence of events for all 
> data types and does not require schema knowledge to work. This latter 
> characteristic reduces implementation complexity and yields faster 
> processing speeds.
> 
> > On Dec 21, 2015, at 1:31 PM, Takuki Kamiya <tkamiya@us.fujitsu.com>
> wrote:
> >
> > Hi,
> >
> > There are two approaches proposed on how to define rules regarding 
> > the encoding of empty elements in schema-informed context.
> >
> > Please provide any opinions as to which of those approaches you 
> > consider more appropriate to have as part of Canonical EXI.
> >
> > The behavior of each approach is described below.
> >
> > Approach A: This approach always first tries to encode empty 
> > elements (i.e. SE followed by EE, optionally AT, etc. in between) as 
> > a sequence of SE CH EE (optionally AT etc. between SE and CH) where 
> > CH is used for representing empty string, for elements defined to 
> > have simple-content, as long as doing so is possible (i.e. unless 
> > the codec in effect does *not* permit to encode empty string "").
> >
> > Approach B: This approach encodes empty elements (i.e. SE followed 
> > by EE, optionally AT, etc. in between) as a sequence of SE EE 
> > (optionally
AT
> etc.
> > in between). As an exception, for elements defined to have 
> > simple-content, it is allowed to insert CH that represents empty 
> > string "" between SE and EE only when doing so is necessary for
> representing an empty element there.
> >
> > Note the approach B provides better efficiency, while approach B 
> > leads to generate the same sequence of events whether strict or 
> > non-strict
> mode.
> >
> > Thank you,
> >
> > Takuki Kamiya
> > Fujitsu Laboratories of America
> >
> >
> >
> >
> 
> AgileDelta, Inc.
> john.schneider@agiledelta.com
> http://www.agiledelta.com
> w: 425-644-7122
> m: 425-503-3403
> f: 425-644-7126
> 
> 
>
Received on Tuesday, 22 December 2015 20:02:34 UTC