- From: Klaus Johannes Rusch <KlausRusch@atmedia.net>
- Date: Tue, 01 Apr 2003 07:44:32 -0100
- To: Terry Teague <terry_teague@users.sourceforge.net>
- Cc: html-tidy@w3.org, tidy-develop@lists.sourceforge.net
Terry Teague wrote: > In developing the next version of a program using Tidy based code, I am > needing to add support for input X(HT)ML files using specific file > extensions, to weed out unwanted files, especially when tidying whole > directories. > > i.e. we don't want to Tidy "mylargeofficesuite.exe" or "mymapoftheworld.jpg". > > Here is a list of file extensions I am using at the moment : > > /* [1] */ ".html"; > /* [2] */ ".htm"; > /* [3] */ ".text"; > /* [4] */ ".txt"; > /* [5] */ ".xml"; > /* [6] */ ".xhtml"; > /* [7] */ ".asp"; > /* [8] */ ".jsp"; > /* [9] */ ".php" I would add .shtml .shtm .phtml .phtml *.wml (WML 2.0 only) .?html (maybe, depends on whether or not you want to process cHTML also) .?htm (maybe, depends on whether or not you want to process cHTML also) and remove .txt Microsoft office products register additional extensions for their HTML templates, try assoc on a Win2000/WinXP machine: .dochtml=wordhtmlfile .docmhtml=wordmhtmlfile .dothtml=wordhtmltemplate .htm=htmlfile .html=htmlfile .htw=htmlfile .htx=htmlfile .mht=mhtmlfile .mhtml=mhtmlfile .pothtml=powerpointhtmltemplate .ppthtml=powerpointhtmlfile .pptmhtml=powerpointmhtmlfile .shtml=NetscapeMarkup .xhtml=xhtmlfile .xlshtml=Excelhtmlfile .xlsmhtml=excelmhtmlfile .xlthtml=Excelhtmltemplate Fragments are likely to be found in *.ssi or *.inc also. Depending on what your program does, you may want to let the user specify extensions, or guess the file type by looking at the content of the document, or both. -- Klaus Johannes Rusch KlausRusch@atmedia.net http://www.atmedia.net/KlausRusch/
Received on Tuesday, 1 April 2003 03:45:55 UTC