W3C home > Mailing lists > Public > www-validator@w3.org > February 2006

Re: Direct input doesn't take XML declaration into account for parsing mode selection

From: Lachlan Hunt <lachlan.hunt@lachy.id.au>
Date: Thu, 09 Feb 2006 02:11:28 +1100
Message-ID: <43EA0A20.5050809@lachy.id.au>
To: Bjoern Hoehrmann <derhoermi@gmx.net>
CC: Dominique Hazael-Massieux <dom@w3.org>, www-validator@w3.org

Bjoern Hoehrmann wrote:
> * Dominique Hazael-Massieux wrote:
>> When using the direct input form for validation with a FPI that the
>> system doesn't recognize, the validator defaults to an SGML-parsing,
>> even when there is an XML declaration at the top of the input. I think
>> the XML declaration should be a good enough hint to switch the
>> XML-parsing.
>   <?xml version='1.0'?>
>   <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
>   <HTML LANG=de>
>   <HEAD>
>   ...
> That's perfectly legal HTML content. The textarea validation essentially 
> assumes text/html input and since W3C refuses to define how to tell HTML 
> and non-HTML text/html content apart,

This could be solved by providing an explicit Content-Type field for the 
direct input.  It's relatively easy to implement, I did it my self in 
about 3 hours.  But it only took that long cause I've never written Perl 
before, so I had to learn quickly and, at the same time, try and 
understand what the script was doing; but anyway, here's what I did. 
It's not perfect, I'm sure there are issues with it, but it's a start.

In the check script:  (line numbers match the script in version 0.7.1, 
they may have changed with subsequent updates since then)

around line 340: add this line:

$File->{Opt}->{'ContentType'}    = $q->param('content-type') ? 
$q->param('content-type') : '';

Around line 392, add this:

if (&conflict($File->{Opt}->{ContentType}, '(detect automatically)')) {
   my ($mode, $ct, $charset) = &parse_content_type($File, 

   $File->{Mode}            = $mode;
   $File->{ContentType}     = $ct;
   $File->{Charset}->{HTTP} = lc $charset;

Finally, in the markup for the direct input form, add this:

<select name="content-type">
   <option value="text/html;charset=UTF-8" 
selected="selected">text/html;charset=UTF-8 (HTML only)</option>
   <option value="application/xhtml+xml">application/xhtml+xml 
(recommended for XHTML)</option>
   <option value="application/xml">application/xml</option>
   <option value="text/xml">text/xml (not recommended)</option>
   <option value="image/svg+xml">image/svg+xml</option>

This would probably also be useful for file upload and URI validation as 
well on the extended interfaces, although they would need a "(detect 
automatically)" option added to the form.

Lachlan Hunt
Received on Wednesday, 8 February 2006 15:11:33 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:30:51 UTC