RE: Constraint on string tables from Vogelheim, Daniel on 2007-08-27 (public-exi@w3.org from August 2007)

From: Vogelheim, Daniel <daniel.vogelheim@siemens.com>
Date: Mon, 27 Aug 2007 16:20:34 +0200
To: "Stanley A. Klein" <sklein@cpcug.org>
Cc: <public-exi@w3.org>
Message-ID: <D720D1DD80557241B9452630315560C30163D6F4@MCHP7IEA.ww002.siemens.net>

Hello Stan,

Thanks for the reply. Here's some more comments from WG discussion of
your case that I hope would be helpful:

You wrote:
> [...] No on-line negotiation would be involved, just a knowledge that
the
> particular implementation is intended for that particular use case.
[...]

OK. If you are willing to restrict compatibility to a particular user
community that opens some additional options:

> You already have something called a pluggable codec, that I don't
> understand.

Pluggable codecs are an extension mechanism, to allow user communities
with some particular requirements to use EXI as a building block. The
spec contains 1) a mechanism to uniquely identify such pluggable codecs,
and 2) a MAY conformance statement which essentially warns users that by
using their own pluggable codecs they are confined to implementations
that support it.

You could absolutely implement such a custom, pre-populated string table
as a custom codec. Given that much of the logic would be in any
conforming implementation anyway this should be relatively easy to do.
But, as said, this would limit compatibility.

> Regarding the option of using enumerations in the schema, 
> this would be difficult for my use case.  The messages generally 
> consist of object names and object values.  The object names 
> are constructed of standardized sub-strings concatenated in a 
> manner similar to file naming.  The actual construction of the 
> object names is difficult to describe in a schema and
> leads to numerous complications, so it is best to just define 
> the names as strings.  The schema can be of some help regarding 
> strings in element and attribute names, but not the object names.

I'm not sure I fully understand. I had assumed that you'd be able to
assemble some potentially large but finite set of strings that you know
are likely to occur, to pre-populate the string table. If so, it should
be possible to use that same string set to specify an enumeration.

Please note that there is no need to exhaustively describe all possible
strings: EXI uses a schema as an indicator of what is likely to occur;
it does not limit what can be encoded. Also note that for some use cases
it may be useful to have separate schemas for encoding (describing the
likely content) and validation (describing all possible content). What
I'm suggesting here is to use an enumeration to inform the encoder about
the likely content, without imposing the need to describe all possible
(but unlikely) variants.

If the issue is an unpredictable set of usually unique strings with a
high proportion of repeated sub-strings (such as the directory part of a
filename) then I'm afraid neither a pre-defined string table nor an
enumeration will be of much help. In that case, the
reordering-plus-compression would likely do rather well. Also a change
in the format where e.g. each path element receives its own XML element
would likely work very well with the standard algorithm, in cases where
a format change is an option.

> Regarding the option of using fragments.  This would be 
> useful, as long as
> the EXI document/fragment itself is a fictional construct that never
> actually exists and whose history except for string tables 
> and some other
> state information can be forgotten after each message is 
> processed.  

Yes, that is exactly the idea. Of course, it would require some care in
the implementation to enable this. 

Sincerely,
Daniel

Received on Monday, 27 August 2007 14:20:57 UTC