W3C home > Mailing lists > Public > www-validator@w3.org > February 2006

Re: Direct input doesn't take XML declaration into account for parsing mode selection

From: Lachlan Hunt <lachlan.hunt@lachy.id.au>
Date: Thu, 09 Feb 2006 02:11:28 +1100
Message-ID: <43EA0A20.5050809@lachy.id.au>
To: Bjoern Hoehrmann <derhoermi@gmx.net>
CC: Dominique Hazael-Massieux <dom@w3.org>, www-validator@w3.org

Bjoern Hoehrmann wrote:
> * Dominique Hazael-Massieux wrote:
>> When using the direct input form for validation with a FPI that the
>> system doesn't recognize, the validator defaults to an SGML-parsing,
>> even when there is an XML declaration at the top of the input. I think
>> the XML declaration should be a good enough hint to switch the
>> XML-parsing.
> 
>   <?xml version='1.0'?>
>   <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
>   <HTML LANG=de>
>   <HEAD>
>   ...
> 
> That's perfectly legal HTML content. The textarea validation essentially 
> assumes text/html input and since W3C refuses to define how to tell HTML 
> and non-HTML text/html content apart,

This could be solved by providing an explicit Content-Type field for the 
direct input.  It's relatively easy to implement, I did it my self in 
about 3 hours.  But it only took that long cause I've never written Perl 
before, so I had to learn quickly and, at the same time, try and 
understand what the script was doing; but anyway, here's what I did. 
It's not perfect, I'm sure there are issues with it, but it's a start.

In the check script:  (line numbers match the script in version 0.7.1, 
they may have changed with subsequent updates since then)

around line 340: add this line:

$File->{Opt}->{'ContentType'}    = $q->param('content-type') ? 
$q->param('content-type') : '';

Around line 392, add this:

if (&conflict($File->{Opt}->{ContentType}, '(detect automatically)')) {
   my ($mode, $ct, $charset) = &parse_content_type($File, 
$File->{Opt}->{'ContentType'});

   $File->{Mode}            = $mode;
   $File->{ContentType}     = $ct;
   $File->{Charset}->{HTTP} = lc $charset;
}

Finally, in the markup for the direct input form, add this:

<p><label><code>Content-Type:</code>
<select name="content-type">
   <option value="text/html;charset=UTF-8" 
selected="selected">text/html;charset=UTF-8 (HTML only)</option>
   <option value="application/xhtml+xml">application/xhtml+xml 
(recommended for XHTML)</option>
   <option value="application/xml">application/xml</option>
   <option value="text/xml">text/xml (not recommended)</option>
   <option value="image/svg+xml">image/svg+xml</option>
</select></label><p>

This would probably also be useful for file upload and URI validation as 
well on the extended interfaces, although they would need a "(detect 
automatically)" option added to the form.

-- 
Lachlan Hunt
http://lachy.id.au/
Received on Wednesday, 8 February 2006 15:11:33 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:20 GMT