- From: Carl Eric Codere <carl.codere@optimasc.com>
- Date: Thu, 04 Oct 2007 23:31:46 -0400
- To: "Vogelheim, Daniel" <daniel.vogelheim@siemens.com>
- CC: public-exi@w3.org
Vogelheim, Daniel wrote: > Hello Carl, > > You wrote: >> Working with different file formats, I am the creator >> of tools that permit identification of different resources (files). I >> have noticed that EXI Structure only contains a 2-bit signature to >> differenciate it from standard XML Documents. As stated in the >> draft standard, this may be changed in the future. For me, for >> easy identification and to avoid conflicts with other formats, this >> would require at least a 32-bit signature/magic value. Has the issue >> been decided on? >> >> Any clarification would be greatly appreciated. > > Thanks for sharing your concerns. The group has not yet reached a > conclusion, so unfortunately I can't really over you any clarifications > yet. > > If you have specific concerns, arguments, or use cases and would want to > share them with us here, I'd gladly present them to the WG during > discussions. > > > Sincerely, > Daniel Vogelheim > Greetings, Here are my arguments, based on both my personal experience, and on existing standards: Overview of file format identification ====================================== As we all know identification of file formats can be done in several ways, today mostly two ways are used by major operating systems. The first one uses file extensions to validate the file format, and then passes it to the current application. Assuming that there is no magic data to clearly identify the file format, it will be difficult for developers to easily validate the he file format. They will need to be able to take into consideration all possible cases (EXI Events should all be well formed). Because of this, software developed for this file format is much more difficult to qualify, and a lot of efforts will be needed to be put on quality insurance just to take into account all wrongly formed binary encodings. On the other hand, the other way of identifying files, as used by most UNIX operating systems identify files by their magic value. This is less error prone because, if the magic is recognized, it is practically assumed that the rest of the file format is valid, or at least generally follow the required structure. Therefore not all error cases of wrongly formed files do not need to be taken into account. This simplifies the quality insurance phase of the applications that will process the file format. Rationale for size of magic value ================================= The rationale for using a magic value of at least 32-bits is simple. With the multiplication of file formats in existence, 2 byte identifiers, as used in some early file formats (on UNIX systems) now conflict with other file formats and are no longer enough to strictly and unambiguously identify files of a specific type. Rationale for value of magic value =================================== As specified in the ISO/IEC 15444 (JPEG2000) standard, as well as in the ISO/IEC 15948 (PNG Specification), the magic value, or file signature can be used for these purposes: - Permits immediate detection of common file-transfer problems. - The magic value should contain a CR-LF sequence which permits catching bad file transfers that alter newlines sequences. - The control-Z character stops file display under MS-DOS. The final line feed checks for the inverse of the CR-LF translation problem. Therefore a good model for a signature, based on the ISO standards above could be (I'm not the one who decided on the signature its up to you, these are merely general suggestions): ASCII C notation: \211 E X I \r \n \032 \n or \211 X M L \r \n \032 \n or something similar to that. If you need further clarifications, please let me know. Thanks for taking the time to consider my proposition, Sincerely yours, Carl Eric Codère http://www.optimasc.com
Received on Friday, 5 October 2007 03:44:33 UTC