- From: Fred P. <fprog26@hotmail.com>
- Date: Fri, 18 Feb 2005 19:37:36 -0500
- To: sdw@lig.net, RogerCutler@chevrontexaco.com
- Cc: public-xml-binary@w3.org
Hi M. Williams, >We don't want to get too far afield with operating system details, but the >'problem' with ls and the 'shell' issues mentioned in an earlier note are >things like ls doing a qsort by default on filenames before returning them >and operating system limitations of command line length. While in >Unix/Linux command line length is relatively huge, wildcards must be >expanded by the shell before being passed to a command like ls which isn't >reasonable when the list needs to be gigantic. As a result, you need to >use commands like 'find' and 'xargs' when dealing with large numbers of >files. Exactly. Altough it seems that tcsh on Sun OS 5.9 works well with +40,000 files. Didn't have enough quota on /tmp to create more than 1GB. >I believe that JAR files are usually compressed zip files, although >certainly the components don't need to be compressed. Concerning that I did some research. In the old days of JDK 1.0.x and JDK 1.1.x, JAR files were not compressed at all, but they introduced compression in JDK 1.2.x Up to my knowledge .zip files were compressed and .jar files were not; however, nowadays it seems that both could be compressed or not using ZLIB. Some people don't send compressed JAR due to old JRE issues or because the cost of decompression/bandwidth saving are not worth it. JAR = Java ARchive http://www.rgagnon.com/javadetails/java-0153.html >The problem I have with the ZIP file format approach to representing >arbitrary XML is that it's not going to be efficient for every case. Some use case like FixML are better served with less verbose XML than anything else or other standards like wbXML: http://www.w3.org/TR/wbxml/ Also, you might notice that JAR files have META-INF/ which can be digitally SIGNED and indexed. Tools already exist on almost every platform to deal with such format. http://java.sun.com/j2se/1.3/docs/guide/jar/jar.html >Some of the characteristics that make it somewhat useful should be >considered in a new format, but it is designed for the granularity of >files, not tags, and it doesn't seem especially elegant for representing >many proposed instances of data. Could you be more precise? The granularity of files = XML tags, as far I'm concern, since the content of a given file is some children of some XML tags, where the file name without the extension is equivalent to the XPath of that XML tag. I don't see what's not elegant in having: /html.xml /html/body/img[1].svgz /html/body/img[2].gif /html/body/img[3].png It's very easy to parse, if you trim the extension you get pure XPath with content attached: i.e. s/\.[a-zA-Z0-9_\$\!]+$//; /html --> returns all child nodes of /html /html/body/img[1] --> returns all child nodes of /html/body/img[1] --> binary SVGz /html/body/img[2] /html/body/img[3] Another thing is that some file may be compressed while others might not, for efficiency purposes. i.e. You compress the XML file, but you don't compress float arrays binary file dump, mp3, jpg, ... The only thing I can see are Unicode XML tags that can be fixed with 7-zip or another revision/extension to the ZIP file format. Another way around would be to 'encode' them in &#decimal; notation or in HEX notation? So, <Lotus\u2081\u2082\u2083> could become <Lotus₁₂₃> Which could be represented like one of those maybe: /Lotus₁₂₃.xml /Lotus#8321#8322#8323.xml /Lotus%2081%2082%2083.xml If you have any more comments, suggestions, improvements or feed back, please send them! =) Sincerely yours, Fred.
Received on Saturday, 19 February 2005 00:38:33 UTC