Re: Identifying (X)HTML without MIME

On Sun, 7 Nov 2004 09:48, James Cerra wrote:
> Now HTML was origionally designed for transport over
> the web via HTTP and identification via MIME types. 
> However, there are cases where (X)HTML may be
> transmitted with no MIME type information available. 
> e.g. Reading a file from a FAT disk or though standard
> io.  I'm writing a program where this type of
> situation may come up.  The specs are silent on the
> issue, so: What are the recommendations for
> identifying the document's type when MIME or HTTP is 
> not available?

Easy enough.

If it starts with "<?xml" it's an XML document.  If it is then in the XHTML1 
namespace, it's XHTML1.  If it's in the XHTML2 namespace, it's XHTML2.  You 
can see that Microsoft already do this to some extent with their WordML 
format (it shows a different icon to other XML files, even when it's named 
file.xml.)  The file(1) command on *nix also tries to distinguish between 
different XML formats to determine the MIME type from the content.

If it doesn't start with "<?xml" but has a DOCTYPE near the top, then it's 
SGML, and you perform similar rules based on what you see after it.

TX

-- 
             Email: Trejkaz Xaoza <trejkaz@xaoza.net>
          Web site: http://xaoza.net/
         Jabber ID: trejkaz@jabber.xaoza.net
   GPG Fingerprint: 9EEB 97D7 8F7B 7977 F39F  A62C B8C7 BC8B 037E EA73

Received on Monday, 8 November 2004 21:07:14 UTC