Re: Magic and EXI Header...

Vogelheim, Daniel wrote:
> Hello Carl,
> 
> You wrote:
>>           Working with different file formats, I am the creator
>> of tools that permit identification of different resources (files). I 
>> have noticed that EXI Structure only contains a 2-bit signature to
>> differenciate it from standard XML Documents. As stated in the
>> draft standard, this may be changed in the future. For me, for
>> easy identification and to avoid conflicts with other formats, this 
>> would require at least a 32-bit signature/magic value. Has the issue
>> been decided on?
>>
>> Any clarification would be greatly appreciated.
> 
> Thanks for sharing your concerns. The group has not yet reached a
> conclusion, so unfortunately I can't really over you any clarifications
> yet. 
> 
> If you have specific concerns, arguments, or use cases and would want to
> share them with us here, I'd gladly present them to the WG during
> discussions.
> 
> 
> Sincerely,
> Daniel Vogelheim
> 
Greetings,
          Here are my arguments, based on both my personal experience, 
and on existing standards:


Overview of file format identification
======================================

As we all know identification of file formats can be done in several 
ways, today mostly two ways are used by major operating systems.

The first one uses file extensions to validate the file format, and then 
passes it to the current application. Assuming that there is no magic 
data to clearly identify the file format, it will be difficult for 
developers to easily validate the he file format. They will need to be 
able to take into  consideration all possible cases (EXI Events should 
all be well formed).

Because of this, software developed for this file format is much more 
difficult to qualify, and a lot of efforts will be needed to be put on 
quality insurance just to take into account all wrongly formed binary 
encodings.

On the other hand, the other way of identifying files, as used  by most 
UNIX operating systems identify files by their  magic value. This is 
less error prone because, if the magic is recognized, it is practically 
assumed that the rest of the file format is valid, or at least generally
follow the required structure. Therefore not all error cases of wrongly 
formed files do not need to be taken into account. This simplifies the 
quality insurance phase of the applications that will process the file
format.

Rationale for size of magic value
=================================

The rationale for using a magic value of at least 32-bits is simple.
With the multiplication of file formats in existence, 2 byte 
identifiers, as used in some early file formats (on UNIX systems) now 
conflict with other  file formats and are no longer enough to strictly 
and unambiguously identify  files of a specific type.

Rationale for value of magic value
===================================
As specified in the ISO/IEC 15444 (JPEG2000) standard, as well as in the
ISO/IEC 15948 (PNG Specification), the magic value, or file signature
can be used for these purposes:

- Permits immediate detection of common file-transfer problems.
- The magic value should contain a CR-LF sequence which permits
   catching bad file transfers that alter newlines sequences.
- The control-Z character stops file display under MS-DOS. The final 
line feed checks for the inverse of the CR-LF translation problem.

Therefore a good model for a signature, based on the ISO standards
above could be (I'm not the one who decided on the signature its
up to you, these are merely general suggestions):

ASCII C notation:
    \211 E X I \r \n \032 \n or
    \211 X M L \r \n \032 \n or
    something similar to that.


If you need further clarifications, please let me know.

Thanks for taking the time to consider my proposition,
Sincerely yours,
Carl Eric Codère
http://www.optimasc.com

Received on Friday, 5 October 2007 03:44:33 UTC