W3C home > Mailing lists > Public > www-validator@w3.org > February 2006

Re: Direct input doesn't take XML declaration into account for parsing mode selection

From: Dominique Hazael-Massieux <dom@w3.org>
Date: Wed, 08 Feb 2006 09:32:53 +0100
To: Bjoern Hoehrmann <derhoermi@gmx.net>
Cc: www-validator@w3.org
Message-Id: <1139387573.6694.122.camel@cumulustier>
Le mercredi 08 février 2006 à 08:58 +0100, Bjoern Hoehrmann a écrit :
> * Dominique Hazael-Massieux wrote:
> >When using the direct input form for validation with a FPI that the
> >system doesn't recognize, the validator defaults to an SGML-parsing,
> >even when there is an XML declaration at the top of the input. I think
> >the XML declaration should be a good enough hint to switch the
> >XML-parsing.
> 
>   <?xml version='1.0'?>
>   <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
>   <HTML LANG=de>
>   <HEAD>
>   ...
> 
> That's perfectly legal HTML content. 
> The textarea validation essentially
> assumes text/html input and since W3C refuses to define how to tell HTML
> and non-HTML text/html content apart, I'm not sure there is much we can
> do to resolve this, other than not assuming text/html. The question
> would then be what to assume, if anything.

I guess I was suggesting that a better algorithm than assuming SGML
parsing in any case for direct input would be to do as follow:
* DOCTYPE known -> use the appropriate parsing mode
* DOCTYPE unknown -> XML Declaration -> XML validation
                  -> no-XML Declaration -> SGML validation

Of course the XML declaration can be interpreted legally in the SGML
validation, but since this is a case where you need more hints rather
than less, I think it's fairly safe to default to XML validation when
encountering an XML declaration.

Dom
-- 
Dominique Hazaël-Massieux - http://www.w3.org/People/Dom/
W3C/ERCIM
mailto:dom@w3.org

Received on Wednesday, 8 February 2006 08:33:47 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:20 GMT