Re: 0.7.0 beta1 issues with "-//RHBNC//DTD HTML 4.01 Augmented//EN" (Was: [ANN] Beta test of the W3C Markup Validator (0.7.0 beta 1)) from olivier Thereaux on 2005-07-14 (www-validator@w3.org from July 2005)

From: olivier Thereaux <ot@w3.org>
Date: Thu, 14 Jul 2005 16:13:51 +0900
To: Philip TAYLOR <P.Taylor@Rhul.Ac.Uk>
Cc: www-validator@w3.org
Message-Id: <80207AB1-7809-4F19-BFC5-D89E0CF79F04@w3.org>

Hello, Philip,

On 13 Jul 2005, at 20:04, Philip TAYLOR wrote:
> Many thanks for the feedback, Olivier.  I am most grateful
> to you for pointing out the defects in my DTD, which I shall
> fix immediately.

Great. Note that unless I hear objections in the enxt few days, I am  
likely to re-add the Latin 1 entities FPI to the SGML catalogue, but  
fixing your DTD will do no harm.

> As regards the DOCTYPE, however, and the
> disambiguation aspect :
>
>     > The MIME Media Type (text/html) for this document is used to  
> serve both
>     > SGML and XML based documents, and it is not possible to  
> disambiguate it
>     > based on the DOCTYPE Declaration in your document. Parsing  
> will continue
>     > in SGML mode.
>
> this does seem a slightly worrying aspect.  Presumably your
> "types database" is hard-coded, and knows only about
> W3C standard DOCTYPEs;

Right.

> do you think there is any mileage
> in allowing some "disambiguation pragmat" in non-standard
> DTDs, and if so, which is the right forum on which to raise
> this issue ?

This is a tough question, and I am probably by far the worst person  
on this list to answer it, but let's give it a try anyway. Frankly,  
even when talking about standard DTDs, we are in the realm of non- 
normative. So it should not be a surprise that for non-standard DTDs,  
the situation is even fuzzier...

- The text/html RFC is informative and makes no mention of the fact  
that such documents should be parsed as SGML or XML
- There is no clear identification that a DTD is an SGML or XML one.  
Well, there are as far as I know rules that XML DTD must follow, that  
are stricter than SGML DTDs don't, so in a way you could use that.  
But I might be wrong. And even if I am right, that's far fetched.
- Even for "standard" XHTML document types, I am not aware of a  
normative clarification of how content served as text/html should be  
parsed. And that's beyond the point of this thread, see: http:// 
www.w3.org/Bugs/Public/show_bug.cgi?id=1500

The "informative" consensus, however, seems to be that text/html is  
mostly for SGML applications, and the fact that XHTML can be served  
as such is just a necessary evil ("necessary" and "evil" being, as a  
matter of fact, both subject to endless arguing) - see http:// 
www.w3.org/TR/xhtml-media-types/#text-html. As a result, I think that  
what the validator does with documents served as text/html and with  
DTDs it doesn't know - parsing them as SGML - is correct.

But that is still heuristic...

And as to whom people should turn to for an actual answer, I guess  
"no one"... The HTML WG could say something about it, but frankly,  
they are already busy enough and the text/html situation is already  
thorny enough with just the W3C standard DTDs that I can't imagine  
they'd like to pronounce themselves on non-standard DTDs... But then  
again, I might be wrong.

Hope this helps,
-- 
olivier

Received on Thursday, 14 July 2005 07:13:58 UTC