- From: Martin Duerst <duerst@it.aoyama.ac.jp>
- Date: Tue, 03 Oct 2006 10:46:59 +0900
- To: www-validator@w3.org
Dear Validator Team, These are some recent experiences with the WWW Markup validator, and some suggestions on how to improve it. It is great that the validator can now also be used for validating arbitrary XML files, but this validation experience is made unneccessarily difficult. The file I'm trying to validate is at http://www.sw.it.aoyama.ac.jp/2006/PB2/examples/book/book.xml, but I'm mostly talking about validating this same document from a file on my computer. First, with file upload, I get a very short indication of what's wrong, and no chance to fix (read overwrite) it. The error message is as follows: Sorry, I am unable to validate this document because on line 10-14, 17, 19, 23-33, 35-41, 44-51, 54-57 it contained one or more bytes that I cannot interpret as us-ascii (in other words, the bytes found are not valid values in the specified Character Encoding). Please check both the content of the file and the character encoding indication. Tracing this with ethereal, it is clear that this behavior is essentially correct because Opera uploads this file with a mime type of text/xml. But why are overrides available on validating an URI, such as at http://validator.w3.org/check?uri=http%3A%2F%2Fwww.sw.it.aoyama.ac.jp%2F2006%2FPB2%2Fexamples%2Fbook%2Fbook.xml (which has exactly the same problem, namely that our server sends out the document as text/xml, which I'll fix as soon as I gave you a chance to compare things), while no overrides are provided for file upload? With current browsers, mime types and charsets sent for uploaded files are at least as uncontrollable by the user as they are for servers. Adding the overrides should be very easy, please do so. The second problem happens when I use direct validation. What I get is the following error message: The MIME Media Type () for this document is used to serve both SGML and XML based documents, and it is not possible to disambiguate it based on the DOCTYPE Declaration in your document. Parsing will continue in SGML mode. This page is not Valid http://www.sw.it.aoyama.ac.jp/2006/PB2/examples/book/book.dtd! Below are the results of attempting to parse this document with an SGML parser. [followed by no such results at all] I get the same results from the extended interface. There are a number of problems with this behavior, all of which can be fixed easily, and except for the first and the last one, any single fix would fix the basic problem: - Don't talk about mime types (there was none in the ethereal trace; multipart/form-data doesn't use them for individual form fields), explain the problem in a way the user can understand and address. - A document starting with "<?xml" can easily be guessed to be XML rather than SGML. - In this day and age of XML, making SGML the default seems terribly outdated, even more so because XML is W3C's own technology. - As you know you may not be able to know whether it's XML or SGML, provide a switch for the user to tell you. - If you validate as SGML, please make sure you do so and produce an actual error message, even if it's just something like "<?xml": Document can't start with PI before DOCTYPE or some such (not sure that's the right error message, though). Many thanks in advance for your help. Regards, Martin. #-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University #-#-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp
Received on Tuesday, 3 October 2006 01:47:34 UTC