Re: [Tidy-dev] What are typical file extensions for X(HT)ML documents?

Terry Teague wrote:

> In developing the next version of a program using Tidy based code, I am
> needing to add support for input X(HT)ML files using specific file
> extensions, to weed out unwanted files, especially when tidying whole
> directories.
>
> i.e. we don't want to Tidy "mylargeofficesuite.exe" or "mymapoftheworld.jpg".
>
> Here is a list of file extensions I am using at the moment :
>
>         /* [1] */ ".html";
>         /* [2] */ ".htm";
>         /* [3] */ ".text";
>         /* [4] */ ".txt";
>         /* [5] */ ".xml";
>         /* [6] */ ".xhtml";
>         /* [7] */ ".asp";
>         /* [8] */ ".jsp";
>         /* [9] */ ".php"

I would add

.shtml
.shtm
.phtml
.phtml

*.wml (WML 2.0 only)

.?html (maybe, depends on whether or not you want to process cHTML also)
.?htm (maybe, depends on whether or not you want to process cHTML also)

and remove

.txt

Microsoft office products register additional extensions for their HTML
templates, try assoc on a Win2000/WinXP machine:

.dochtml=wordhtmlfile
.docmhtml=wordmhtmlfile
.dothtml=wordhtmltemplate
.htm=htmlfile
.html=htmlfile
.htw=htmlfile
.htx=htmlfile
.mht=mhtmlfile
.mhtml=mhtmlfile
.pothtml=powerpointhtmltemplate
.ppthtml=powerpointhtmlfile
.pptmhtml=powerpointmhtmlfile
.shtml=NetscapeMarkup
.xhtml=xhtmlfile
.xlshtml=Excelhtmlfile
.xlsmhtml=excelmhtmlfile
.xlthtml=Excelhtmltemplate

Fragments are likely to be found in *.ssi or *.inc also.

Depending on what your program does, you may want to let the user specify
extensions, or guess the file type by looking at the content of the document, or
both.

--
Klaus Johannes Rusch
KlausRusch@atmedia.net
http://www.atmedia.net/KlausRusch/

Received on Tuesday, 1 April 2003 03:45:55 UTC