- From: Martin Duerst <duerst@it.aoyama.ac.jp>
- Date: Wed, 30 May 2007 18:22:19 +0900
- To: olivier Thereaux <ot@w3.org>
- Cc: www-validator@w3.org
Hello Olivier, At 13:41 07/05/30, olivier Thereaux wrote: >Many of the issues you raise are already in bugzilla, or have been >discussed in the past few days and fixed in the dev version. Great! >On May 29, 2007, at 17:19 , Martin Duerst wrote: > >> I used the data/file that you can find at >> http://www.sw.it.aoyama.ac.jp/2007/PB1/examples/test.xml >> >> With 'direct input' at validator.w3.org, I get >> "This page is not Valid (no Doctype found)!". > >Your document uses a custom document type, not in the validator's >catalogue. >And without a media type to help (because you are using the direct >input mode) there is no unambiguous way to determine whether to use >XML or SGML parsing modes. The errors you get are, I believe, >cascading from the fallback to SGML mode, when your DTD elements are >XML. Okay. >This is a known and documented issue: >http://www.w3.org/Bugs/Public/show_bug.cgi?id=1391 > >It has been argued that an XML declaration should be a good enough >trigger, but others (Hixie among others, I believe) have disagreed, >as it also happens to be a valid SGML PI. Well, yes, it happens to be a valid SGML PI, of course, because XML is designed to work with SGML tools, with a particular SGML declaration. >Generally speaking, the validator isn't the most adapted tool for >checking XML documents with home-made DTDs, particularly with the >Direct Input method. We'd like to make it better in this regard, but >that is not a priority. If you want to submit patches to make it >better in this regard, without being detrimental to its main job, I can definitely submit a patch that goes into XML mode if an XML declaration is present. I don't consider this as being detrimental to the validator's job, quite to the contrary. If that's not what you mean, please tell me. >I believe you're familiar with the code, Well, that was quite some time ago, and a lot of work has gone into the validator since, but to some extent, yes. >and you even have CVS commit access... I didn't know that, but I'll try to make use of it. The main problem will not be the validator code, but CVS; getting from Subversion back to CVS is a pain. >> Oh well, there was no doctype? I guess the validator is blind, or >> what? > >That tone is inappropriate. An aggressive or sarcastic tone isn't >much welcome on this public list (or you'd better be coming with >perfect patches to compensate). I totally agree if such a tone was targetted at a person. Even in the above case, I was probably a bit too direct, and I appologize. But I guess that's just about how the average validator user would react. >> And if I tell it to use some preset doctype only if the >> doctype is missing, it still tells me that the doctype >> is missing, so it doesn't look like the "use Doctype" >> setting in the Options is any good. > >This has been fixed in the dev version, soon to be beta2. >http://qa-dev.w3.org/wmvs/HEAD/ Great to know, thanks. >> Next, I tried with a DTD located relative to the xml file. > >We don't do relative SIs. Yet. >http://www.w3.org/Bugs/Public/show_bug.cgi?id=1521 If that can be handled in the validator code, I'll try to submit a patch. But it might take a while. >> Next I tried with a file with some actual non-ASCII characters. >> http://www.sw.it.aoyama.ac.jp/2007/PB1/examples/test-UTF-8.xml. >[...] >> However, the results on the beta validator are detrimental. I get: >> Sorry! This document can not be checked. >> >> Sorry, I am unable to validate this document because on line 0 it >> contained one or more bytes that I cannot interpret as us-ascii >> (in other words, the bytes found are not valid values in the >> specified >> Character Encoding). Please check both the content of the file >> and the >> character encoding indication. >> >> This happens with both URI and File Upload, > >I can't reproduce this. Did you perhaps change the encoding >declaration in the document to state UTF-8 instead of us-ascii? The document didn't change, even the things reported higher up always had encoding='UTF-8' in the XML declaration. The only thing that I changed was that when I drafted the mail on Sunday, the document was served as text/xml, and I used the Charset override to make sure it was processed as UTF-8. I realized that serving documents, most of which are real UTF-8, as text/xml is a server setup problem, so now the document is served as text/xml; charset=utf-8. I wouldn't expect the beta validator give different results (except for the 'tentatively' bit) for charset override and charset from Mime type, but I don't know the code enough to be sure to exclude this possibility. Also, in any case, the document never contained any non-ASCII stuff on line 1 (the only thing there is the XML declaration). >> even with utf-8 selected >> in the options. This is a very serious bug, please fix it. > >The charset override was broken in the 0.8.0 beta1. It is now fixed. This would probably explain things, see above. Is there a plan to release a beta2? >> For the beta version with file upload or URI input, the "line 0" error >> raises its ugly head again. > >This has been fixed last week I believe. Great! Thanks for all your great work, Martin. #-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University #-#-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp
Received on Wednesday, 30 May 2007 09:27:49 UTC