Re: 0.7.0 beta1 issues with "-//RHBNC//DTD HTML 4.01 Augmented//EN" (Was: [ANN] Beta test of the W3C Markup Validator (0.7.0 beta 1))

Hi Philip,

Thanks for checking the beta validator.

On Jul 12, 2005, at 22:06, Philip TAYLOR wrote:
> ->
> 	Unknown Document Type and Parse Mode!

I checked the part of the code that issued this warning. The said 
warning only happens when:
- the pre-parsing found a Doctype
- and the content-type cannot disambiguate whether to use XGML or XML 
mode (i.e, text/html)
- ... but the doctype is not in our types database with info to 
disambiguate the mode

so instead of
The MIME Media Type (text/html) for this document is used to serve both 
SGML and XML based documents, and no  DOCTYPE Declaration was found to 
disambiguate it. Parsing will continue in SGML mode and with a fallback 
  DOCTYPE similar to HTML 4.01 Transitional.
I think it should be something like
The MIME Media Type (text/html) for this document is used to serve both 
SGML and XML based documents, and it is not possible to disambiguate it 
based on the DOCTYPE Declaration in your document. Parsing will 
continue in SGML mode.
I think Terje initially wrote this, he's really busy these days but 
I'll try to see if he can give it a look.

Now for the other issue...

>    I should add that it commences :
> 	  1: <!DOCTYPE HTML PUBLIC "-//RHBNC//DTD HTML 4.01 Augmented//EN"
> 	  2: 	""
> 	  3: >
> Error Line 76 column 27: general entity "nbsp" not defined and no 
> default entity.
> This diagnostic is not issued by the current validator

This is SGML territory, so hopefully someone will be able to confirm, 
or correct, my understanding of the situation.

* You are using a "custom" DTD, based on a copy of the HTML 4.01 DTD, 
and which you're publishing at:

* In that DTD, the reference to entities is made (as in HTML 4.01) with 
relative URIS, e.g:
    "-//W3C//ENTITIES Latin1//EN//HTML"

But there is nothing at
Isn't that a mistake?

Now the reason why the "usual" validator (v0.6.7) does not complain 
about this is that the SGML catalogue it uses knows how to dereference 
the "-//W3C//ENTITIES Latin1//EN//HTML" FPI, whereas the "new" 
validator has a catalogue that only knows "-//W3C//ENTITIES Latin 
1//EN//HTML". This is most likely a victim of a cleanup of the said 
catalogue. The cleanup was a bit zealous and it's possible that this 
removal was a mistake. Hmm, quite probable actually, the DTD in the 
HTML4.01 spec uses the "Latin1" FPI, not "Latin 1". Could anyone among 
our SGML gurus confirm?


Received on Wednesday, 13 July 2005 05:05:17 UTC