RE: Constraint on string tables from Stanley A. Klein on 2007-08-27 (public-exi@w3.org from August 2007)

From: Stanley A. Klein <sklein@cpcug.org>
Date: Mon, 27 Aug 2007 15:04:00 -0400 (EDT)
To: "Vogelheim, Daniel" <daniel.vogelheim@siemens.com>
Cc: public-exi@w3.org
Message-ID: <18407.207.188.248.157.1188241440.squirrel@www.cpcug.org>
Daniel -

Regarding the enumeration option, would it be possible to provide EXI a
separate auxiliary schema containing only the relevant sub-string
enumerations without actually defining how they are combined into the
strings (which might be required for validation if the enumerations were
actually used in the overall schema)?

For example, in the main schema we have a type which is an object name
defined as a string of a certain maximum length (e.g., 64 characters).  In
the auxiliary schema, we have three enumerations, the first having the
strings Abc, Def, Ghi, and Jkl.  The second has strings mno, pqr, stuv,
and wxyz.  The third has strings aa, bbbb, ccccc, dddddd, and eeeeeee.

The actual object names are strings like Abc.mno.aa, Abc.stuv.dddddd,
Def.wxyz.ccccc, Jkl.pqr.dddddd.wxyz.ccccc, and so on, with the various
sub-strings in the enumerations being formed into object names by
concatenating with dot separators.

Would something like this work with EXI as specified in the draft?

Thanks.


Stan Klein


On Mon, August 27, 2007 10:20 am, Vogelheim, Daniel wrote:
>
> Hello Stan,
>
> Thanks for the reply. Here's some more comments from WG discussion of
> your case that I hope would be helpful:
>
>
> You wrote:
>> [...] No on-line negotiation would be involved, just a knowledge that
> the
>> particular implementation is intended for that particular use case.
> [...]
>
> OK. If you are willing to restrict compatibility to a particular user
> community that opens some additional options:
>
>> You already have something called a pluggable codec, that I don't
>> understand.
>
> Pluggable codecs are an extension mechanism, to allow user communities
> with some particular requirements to use EXI as a building block. The
> spec contains 1) a mechanism to uniquely identify such pluggable codecs,
> and 2) a MAY conformance statement which essentially warns users that by
> using their own pluggable codecs they are confined to implementations
> that support it.
>
> You could absolutely implement such a custom, pre-populated string table
> as a custom codec. Given that much of the logic would be in any
> conforming implementation anyway this should be relatively easy to do.
> But, as said, this would limit compatibility.
>
>
>> Regarding the option of using enumerations in the schema,
>> this would be difficult for my use case.  The messages generally
>> consist of object names and object values.  The object names
>> are constructed of standardized sub-strings concatenated in a
>> manner similar to file naming.  The actual construction of the
>> object names is difficult to describe in a schema and
>> leads to numerous complications, so it is best to just define
>> the names as strings.  The schema can be of some help regarding
>> strings in element and attribute names, but not the object names.
>
> I'm not sure I fully understand. I had assumed that you'd be able to
> assemble some potentially large but finite set of strings that you know
> are likely to occur, to pre-populate the string table. If so, it should
> be possible to use that same string set to specify an enumeration.
>
> Please note that there is no need to exhaustively describe all possible
> strings: EXI uses a schema as an indicator of what is likely to occur;
> it does not limit what can be encoded. Also note that for some use cases
> it may be useful to have separate schemas for encoding (describing the
> likely content) and validation (describing all possible content). What
> I'm suggesting here is to use an enumeration to inform the encoder about
> the likely content, without imposing the need to describe all possible
> (but unlikely) variants.
>
> If the issue is an unpredictable set of usually unique strings with a
> high proportion of repeated sub-strings (such as the directory part of a
> filename) then I'm afraid neither a pre-defined string table nor an
> enumeration will be of much help. In that case, the
> reordering-plus-compression would likely do rather well. Also a change
> in the format where e.g. each path element receives its own XML element
> would likely work very well with the standard algorithm, in cases where
> a format change is an option.
>
>
>> Regarding the option of using fragments.  This would be
>> useful, as long as
>> the EXI document/fragment itself is a fictional construct that never
>> actually exists and whose history except for string tables
>> and some other
>> state information can be forgotten after each message is
>> processed.
>
> Yes, that is exactly the idea. Of course, it would require some care in
> the implementation to enable this.
>
>
> Sincerely,
> Daniel
>


--
Received on Monday, 27 August 2007 18:42:55 UTC