W3C home > Mailing lists > Public > www-validator@w3.org > March 2009

Re: Role of DTDs in the validator (was: Re: Validating SVG+RDF)

From: olivier Thereaux <ot@w3.org>
Date: Tue, 17 Mar 2009 13:20:08 -0400
To: Henri Sivonen <hsivonen@iki.fi>
Message-Id: <21691F5D-2A68-4612-AC61-1C56153EEFF7@w3.org>
Cc: www-validator Community <www-validator@w3.org>
Hi Henri,

Thanks for elevating the debate. Your questions are very relevant here.

On 9-Mar-09, at 10:35 AM, Henri Sivonen wrote:

> On Mar 9, 2009, at 14:49, olivier Thereaux wrote:
>> Not trivial, but feasible. I invite you to review (with the WG) the  
>> validator's development roadmap, which looks into this question:
>> http://qa-dev.w3.org/wmvs/HEAD/todo.html#roadmap
> I notice that DTD validation is rather prominent in the next gen  
> picture.

Mostly for legacy document types, yes. Note, and I think it is really  
important, that for now, the "next gen picture" is merely my personal  
brain dump. You are the first person to give any feedback. Consider it  
work in progress, and not  a vetted w3c statement.

FWIW, I doubt that there can be a w3c-wide agreement on this. The very  
disparate communities that form the W3C aren't likely to agree on  
whether DTDs are "good" or "bad", on which schema language to use (if  
at all), etc. That's why IMHO the job of making a validator that  
caters to everyone is a tightrope walk  blindfolded. One is at risk  
of pissing off one community or another with every choice, and every  
choice is motivated by a lot of guesswork.

Digression closed. All that said, I'm in perfect agreement that there  
could, and should be a push away from DTDs wherever it makes sense.  
One bit of code I did today changes the way validator.w3.org handles  
doctype-less SVG: it used to be passed to the opensp (DTD) engine and  
I'm experimenting sending it to validator.nu.


What about SVG documents with a doctype? I don't know... For now I  
kept the code that passes them to a DTD engine...

> Considering that RELAX NG or RELAX NG plus something else (Java,  
> Schematron) validation exists for HTML 4.01, SVG 1.1 and MathML 2.0  
> and newer specs such as SVG 1.2, HTML 5 and MathML 3.0 either don't  
> have a DTD or have a DTD as the less preferred schema, I wonder what  
> the purpose of DTD-based validation in "next gen" is.
> Is keeping providing QA tools for authors who create HTML 2.0, 3.2,  
> 4.0 or ISO HTML documents a goal[1]? Is not introducing more  
> accuracy to HTML 4.01 and SVG 1.1 validation so that previously  
> "valid" pages aren't found invalid a goal? Is maintaining support  
> for custom DTDs in SGML or XML a goal?
> [1] http://lists.w3.org/Archives/Public/www-validator/2005Sep/ 
> 0052.html

Again, I'm going to have to reply with my own, personal opinion rather  
than w3c's. I believe that:

* There is little point in making DTDs for newly developed languages.  
I am not an expert, but given the limitations of DTDs and given how  
Web languages tend towards mix-and-match (with or without namespaces),  
DTDs just don't seem to fit.

* There is however a large portion of the "document" world still  
happily using DTDs for their documents - in the publishing industry  
and academia. If there is a reason to keep support for DTDs, this is it.

* We don't want another Knuth incident, or 1000. Any change in  
validation of "legacy" documents will have to be very careful and well  
explained. I do agree however that features brought by relaxng 
+schematron+... such as checking attribute values would be very  
desirable. It's not about "freezing" the legacy validation with DTDs,  
it's about managing change.

* Finally, my foolish hope is that regardless of engine changes, a lot  
of the work done for validator.w3.org on usability, error message  
explanations, pre-parsing, handling of character encodings etc. will  
not be lost.

olivier Thereaux
http://www.w3.org/People/olivier - http://artbeat.me/ - http://yoda.zoy.org/
Received on Tuesday, 17 March 2009 17:20:18 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:59:12 UTC