Re: Role of DTDs in the validator

On Mar 17, 2009, at 19:20, olivier Thereaux wrote:

> On 9-Mar-09, at 10:35 AM, Henri Sivonen wrote:
>
>> On Mar 9, 2009, at 14:49, olivier Thereaux wrote:
>>
>>> Not trivial, but feasible. I invite you to review (with the WG)  
>>> the validator's development roadmap, which looks into this question:
>>> http://qa-dev.w3.org/wmvs/HEAD/todo.html#roadmap
>>
>> I notice that DTD validation is rather prominent in the next gen  
>> picture.
>
> Mostly for legacy document types, yes. Note, and I think it is  
> really important, that for now, the "next gen picture" is merely my  
> personal brain dump. You are the first person to give any feedback.  
> Consider it work in progress, and not  a vetted w3c statement.

OK.

> FWIW, I doubt that there can be a w3c-wide agreement on this. The  
> very disparate communities that form the W3C aren't likely to agree  
> on whether DTDs are "good" or "bad", on which schema language to use  
> (if at all), etc.

Which communities still want to use DTDs on their own right as opposed  
to using them because DTDss are what validator.w3.org uses?

It seems to me that the XML community has moved on from DTDs. Maybe  
they don't agree on XSD vs. RNG, but isn't agreement on "not DTD" the  
prevailing attitude these days?

The HTML5 community has moved on from DTDs.

> Digression closed. All that said, I'm in perfect agreement that  
> there could, and should be a push away from DTDs wherever it makes  
> sense.

I think the key question is whether there are areas where pushing away  
from DTDs doesn't make sense. :-)

> One bit of code I did today changes the way validator.w3.org handles  
> doctype-less SVG: it used to be passed to the opensp (DTD) engine  
> and I'm experimenting sending it to validator.nu.
>
> http://qa-dev.w3.org/wmvs/HEAD/check?uri=http%3A%2F%2Fupload.wikimedia.org%2Fwikipedia%2Fcommons%2F0%2F0e%2FInkscape_logo_2.svg

Has this change been reversed during the past month? The outcome  
doesn't contain an error about class="black;", which is what  
Validator.nu says.

> What about SVG documents with a doctype? I don't know... For now I  
> kept the code that passes them to a DTD engine...

Is there a reason why an informed user would want to use DTD-based  
validation over RELAX NG-based validation for SVG 1.1 if the goal of  
the user is actual quality assurance (as opposed to wanting the  
validator say that the document is correct)?

Is there any case where quality assurance of newly-created content  
needs to validate against the SVG 1.0 conformance definition instead  
of validating against the SVG 1.1 Full conformance definition?

>> Considering that RELAX NG or RELAX NG plus something else (Java,  
>> Schematron) validation exists for HTML 4.01, SVG 1.1 and MathML 2.0  
>> and newer specs such as SVG 1.2, HTML 5 and MathML 3.0 either don't  
>> have a DTD or have a DTD as the less preferred schema, I wonder  
>> what the purpose of DTD-based validation in "next gen" is.
>>
>> Is keeping providing QA tools for authors who create HTML 2.0, 3.2,  
>> 4.0 or ISO HTML documents a goal[1]? Is not introducing more  
>> accuracy to HTML 4.01 and SVG 1.1 validation so that previously  
>> "valid" pages aren't found invalid a goal? Is maintaining support  
>> for custom DTDs in SGML or XML a goal?
>>
>> [1] http://lists.w3.org/Archives/Public/www-validator/2005Sep/0052.html
>
> Again, I'm going to have to reply with my own, personal opinion  
> rather than w3c's. I believe that:
>
> * There is little point in making DTDs for newly developed  
> languages. I am not an expert, but given the limitations of DTDs and  
> given how Web languages tend towards mix-and-match (with or without  
> namespaces), DTDs just don't seem to fit.

I agree. It's sad that ARIA additions to legacy DTDs are being  
discussed on this list when the system deployed at validator.w3.org  
supports ARIA (albeit a draft a year out of date) better than DTDs  
ever could--if you use the HTML doctype: <!DOCTYPE html>.

> * There is however a large portion of the "document" world still  
> happily using DTDs for their documents - in the publishing industry  
> and academia. If there is a reason to keep support for DTDs, this is  
> it.

I thought the "document" world (the world that uses TEI, DocBook and  
languages like those) had moved to RELAX NG.

In any case, why is the document world relevant to the W3C Validator?  
The W3C Validator validates resources that are available to the public  
via HTTP while the "document" world tends to live on local file  
systems behind firewalls.

> * We don't want another Knuth incident, or 1000. Any change in  
> validation of "legacy" documents will have to be very careful and  
> well explained. I do agree however that features brought by relaxng 
> +schematron+... such as checking attribute values would be very  
> desirable. It's not about "freezing" the legacy validation with  
> DTDs, it's about managing change.

I wonder if the Knuth incident were viewed differently if it had been  
just a random non-famous person.

Would it make sense for the W3C validator team to allocate resources  
to support a random person authoring quirks-mode pages using an  
ancient flavor of HTML that has never been a W3C REC or even on the  
REC track?

If there were superior code paths compared to OpenSP for everything  
from HTML 4.01 onwards, would it really make sense to let OpenSP  
dictate the overall architecture of the validator in order to keep  
support for HTML 4.0 (as opposed to 4.01), 3.2, 2.0, ISO HTML and  
various Netscape and O'Reilly variants around?

Ignoring DTDs for a moment, why should the W3C Validator facilitate  
the creation of new quirks-mode content? And why should validator  
development be concerned about validating pages that aren't being  
authored (from scratch or updated) today?

> * Finally, my foolish hope is that regardless of engine changes, a  
> lot of the work done for validator.w3.org on usability, error  
> message explanations, pre-parsing, handling of character encodings  
> etc. will not be lost.

I will, of course, promote Validator.nu internals as the replacement  
for DTD-based validation. In that light:

  * The RELAX NG error messages in Validator.nu suck, but I think  
that's not a sufficient reason to switch over to e.g. MSV, which uses  
deep recursion more, which is an issue in a public-facing Web service.  
I've been hoping Jing upstream would resolve the message issue, but  
perhaps I should spend time on merging the patch from http://code.google.com/p/jing-trang/issues/detail?id=35 
  into the Validator.nu branch.

  * What kind of pre-parsing does a validator need and why? (The only  
pre-parsing Validator.nu does is the HTML <meta charset> scan--per  
spec.)

  * Validator.nu has pretty comprehensive character encoding support  
with the intentional limitations that
   1) It doesn't try to detect the encoding in ways that HTML5 and XML  
don't prescribe. (Essential for proper QA preflight for non-validator  
consumers.)
   2) It whines about encodings that aren't commonly supported.
   3) It doesn't support encodings that aren't rough ASCII supersets  
except UTF-16. (Such encodings are considered harmful.)

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Friday, 17 April 2009 12:08:14 UTC