W3C home > Mailing lists > Public > www-validator@w3.org > May 2004

Re: Custom DTD Support (was: Re: frameset and frame borders)

From: Terje Bless <link@pobox.com>
Date: Sat, 22 May 2004 21:49:24 +0200
To: W3C Validator <www-validator@w3.org>
Message-ID: <b02010203-1033-21215E9EAC2911D8B4B40030657B83E8@[193.157.66.23]>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Jukka K. Korpela <jkorpela@cs.tut.fi> wrote:

>On Sat, 22 May 2004, Terje Bless wrote:
>
>>If possible, we would appreciate it if you could resubmit your document
>>for processing and let us know whether this has resolved the issue you
>>reported.
>
>The issue was resolved for the test case related to adding attributes
>into the Frameset DTD. The test case now validates.

Thank you.


>But for http://www.cs.tut.fi/~jkorpela/html/nobr.html there's still a
>problem, though a different one. Apparently the validator now recognizes
>the customized DTD, but it just reports that the page is not valid,
>followed by "Below are the results of attempting to parse this document
>with an SGML parser." followed by no results. (The page _is_ valid.)

Well, whether the page is considered Valid probably depends on how you choose
to apply the term in this context (see below). But either way, it's a highly
unfortunate behaviour in the Validator to give the "Invalid" result page
without actually listing any errors though. Thanks for reporting this!


>I remember having seen such a situation earlier, but I think the problem
>was fixed, and now it seems to have re-emerged.

That may have been with the WDG HTML Validator (see below).


>I think there used to the problem that if I add new elements into the
>definition of %phrase (as I do, by introducing NOBR), some internal
>limit (GRPCNT, I think) in the validator prevented validation -
>presumably it was unable to process the DTD. And I vaguely remember this
>caused a situation like the above. But when I now tried removing
>ACRONYM, thereby making the number of elements the same as in HTML 4.01
>DTD, it did not help. When I removed both ACRONYM and DFN, validation
>was successful. I'm puzzled.

This is in fact exactly what is going on. Why removing ACRONYM didn't work
isn't yet clear to me — possibly it's because the exceeded limit is in a
different place than what one might initially expect, making the different
elements unequal in this regard — but the error triggered by your modified DTD
is exceeding the GRPCNT.

The Validator isn't reporting this because we supress errors located in
external entities (e.g. the External Subset)[0][1].

This is arguably the correct behaviour as that value is set in the SGML
Declaration and HTML 4.01 has a (unfortunately, rather implicit) fixed SGML
Declaration. In theory you can override this with a SGMLDECL Declaration (from
the WebSGML Annex to SGML), but I wouldn't recommend it and I suspect the
Validator would not handle this very well.

The WDG HTML Validator (and, probably, Page Valet), IIRC, uses a modified SGML
Declaration that extends these limits somewhat; and I think Liam Quinn once
sent us the values he's used. Unfortunately I didn't have time at that point
to really investigate it and I was undecided on whether using a modified SGML
Declaration was the correct thing to do.

The W3C Markup Validator has always used the SGML Declaration from the HTML
4.01 Recommendation[2], so it's unlikely that document has ever passed there.


I'll look into the details of this issue, but as mentioned I'm uncertain as to
the correct course of action here. I would appreciate comments and opinions on
this and ways to address the issue.



[0] - Partially this is due to some spurious messages emitted by the SGML
      Parser for dubious — but not invalid — constructs in some W3C DTDs;
      and partially it's a conscious design decision related to the fact
      that the Validator is focussing on checking the part of documents
      authored by end users and not the DTDs. The latter because the
      majority of DTDs are assumed to have been written by people capable
      of manually checking the DTD with an SGML Parser directly.

[1] - There are three very usefull debugging options that we use during
      development that will reveal issues such as this. They are ";debug=1"
      to enable debugging output, ";esis=1" to show the raw ESIS output
      from onsgmls, and ";errors=1" to show the raw error (stderr) output
      of onsgmls.

      By appending these options to the CGI URL you will get the
      associated option's output.

[2] - This has been virtually unchanged for all HTML Recommendations
      published by the W3C, and the Validator has always used the same
      one (modulo some Document Charset issues).


- -- 
"If at first you don't succeed, keep shooting."  -- monk

-----BEGIN PGP SIGNATURE-----
Version: PGP SDK 3.0.3

iQA/AwUBQK+uwqPyPrIkdfXsEQK8WgCfYoPKAJh0hNY5u0mI3vm7+tFkSUUAoKZj
izJZlQ3nUWvrjAL+kHq6y4+B
=9b/x
-----END PGP SIGNATURE-----
Received on Saturday, 22 May 2004 15:49:28 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:13 GMT