- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Fri, 17 Apr 2009 15:07:24 +0300
- To: www-validator Community <www-validator@w3.org>
On Mar 17, 2009, at 19:20, olivier Thereaux wrote: > On 9-Mar-09, at 10:35 AM, Henri Sivonen wrote: > >> On Mar 9, 2009, at 14:49, olivier Thereaux wrote: >> >>> Not trivial, but feasible. I invite you to review (with the WG) >>> the validator's development roadmap, which looks into this question: >>> http://qa-dev.w3.org/wmvs/HEAD/todo.html#roadmap >> >> I notice that DTD validation is rather prominent in the next gen >> picture. > > Mostly for legacy document types, yes. Note, and I think it is > really important, that for now, the "next gen picture" is merely my > personal brain dump. You are the first person to give any feedback. > Consider it work in progress, and not a vetted w3c statement. OK. > FWIW, I doubt that there can be a w3c-wide agreement on this. The > very disparate communities that form the W3C aren't likely to agree > on whether DTDs are "good" or "bad", on which schema language to use > (if at all), etc. Which communities still want to use DTDs on their own right as opposed to using them because DTDss are what validator.w3.org uses? It seems to me that the XML community has moved on from DTDs. Maybe they don't agree on XSD vs. RNG, but isn't agreement on "not DTD" the prevailing attitude these days? The HTML5 community has moved on from DTDs. > Digression closed. All that said, I'm in perfect agreement that > there could, and should be a push away from DTDs wherever it makes > sense. I think the key question is whether there are areas where pushing away from DTDs doesn't make sense. :-) > One bit of code I did today changes the way validator.w3.org handles > doctype-less SVG: it used to be passed to the opensp (DTD) engine > and I'm experimenting sending it to validator.nu. > > http://qa-dev.w3.org/wmvs/HEAD/check?uri=http%3A%2F%2Fupload.wikimedia.org%2Fwikipedia%2Fcommons%2F0%2F0e%2FInkscape_logo_2.svg Has this change been reversed during the past month? The outcome doesn't contain an error about class="black;", which is what Validator.nu says. > What about SVG documents with a doctype? I don't know... For now I > kept the code that passes them to a DTD engine... Is there a reason why an informed user would want to use DTD-based validation over RELAX NG-based validation for SVG 1.1 if the goal of the user is actual quality assurance (as opposed to wanting the validator say that the document is correct)? Is there any case where quality assurance of newly-created content needs to validate against the SVG 1.0 conformance definition instead of validating against the SVG 1.1 Full conformance definition? >> Considering that RELAX NG or RELAX NG plus something else (Java, >> Schematron) validation exists for HTML 4.01, SVG 1.1 and MathML 2.0 >> and newer specs such as SVG 1.2, HTML 5 and MathML 3.0 either don't >> have a DTD or have a DTD as the less preferred schema, I wonder >> what the purpose of DTD-based validation in "next gen" is. >> >> Is keeping providing QA tools for authors who create HTML 2.0, 3.2, >> 4.0 or ISO HTML documents a goal[1]? Is not introducing more >> accuracy to HTML 4.01 and SVG 1.1 validation so that previously >> "valid" pages aren't found invalid a goal? Is maintaining support >> for custom DTDs in SGML or XML a goal? >> >> [1] http://lists.w3.org/Archives/Public/www-validator/2005Sep/0052.html > > Again, I'm going to have to reply with my own, personal opinion > rather than w3c's. I believe that: > > * There is little point in making DTDs for newly developed > languages. I am not an expert, but given the limitations of DTDs and > given how Web languages tend towards mix-and-match (with or without > namespaces), DTDs just don't seem to fit. I agree. It's sad that ARIA additions to legacy DTDs are being discussed on this list when the system deployed at validator.w3.org supports ARIA (albeit a draft a year out of date) better than DTDs ever could--if you use the HTML doctype: <!DOCTYPE html>. > * There is however a large portion of the "document" world still > happily using DTDs for their documents - in the publishing industry > and academia. If there is a reason to keep support for DTDs, this is > it. I thought the "document" world (the world that uses TEI, DocBook and languages like those) had moved to RELAX NG. In any case, why is the document world relevant to the W3C Validator? The W3C Validator validates resources that are available to the public via HTTP while the "document" world tends to live on local file systems behind firewalls. > * We don't want another Knuth incident, or 1000. Any change in > validation of "legacy" documents will have to be very careful and > well explained. I do agree however that features brought by relaxng > +schematron+... such as checking attribute values would be very > desirable. It's not about "freezing" the legacy validation with > DTDs, it's about managing change. I wonder if the Knuth incident were viewed differently if it had been just a random non-famous person. Would it make sense for the W3C validator team to allocate resources to support a random person authoring quirks-mode pages using an ancient flavor of HTML that has never been a W3C REC or even on the REC track? If there were superior code paths compared to OpenSP for everything from HTML 4.01 onwards, would it really make sense to let OpenSP dictate the overall architecture of the validator in order to keep support for HTML 4.0 (as opposed to 4.01), 3.2, 2.0, ISO HTML and various Netscape and O'Reilly variants around? Ignoring DTDs for a moment, why should the W3C Validator facilitate the creation of new quirks-mode content? And why should validator development be concerned about validating pages that aren't being authored (from scratch or updated) today? > * Finally, my foolish hope is that regardless of engine changes, a > lot of the work done for validator.w3.org on usability, error > message explanations, pre-parsing, handling of character encodings > etc. will not be lost. I will, of course, promote Validator.nu internals as the replacement for DTD-based validation. In that light: * The RELAX NG error messages in Validator.nu suck, but I think that's not a sufficient reason to switch over to e.g. MSV, which uses deep recursion more, which is an issue in a public-facing Web service. I've been hoping Jing upstream would resolve the message issue, but perhaps I should spend time on merging the patch from http://code.google.com/p/jing-trang/issues/detail?id=35 into the Validator.nu branch. * What kind of pre-parsing does a validator need and why? (The only pre-parsing Validator.nu does is the HTML <meta charset> scan--per spec.) * Validator.nu has pretty comprehensive character encoding support with the intentional limitations that 1) It doesn't try to detect the encoding in ways that HTML5 and XML don't prescribe. (Essential for proper QA preflight for non-validator consumers.) 2) It whines about encodings that aren't commonly supported. 3) It doesn't support encodings that aren't rough ASCII supersets except UTF-16. (Such encodings are considered harmful.) -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
Received on Friday, 17 April 2009 12:08:14 UTC