- From: Bjoern Hoehrmann <derhoermi@gmx.net>
- Date: Sat, 28 Aug 2004 09:24:28 +0200
- To: Martin Duerst <duerst@w3.org>
- Cc: public-qa-dev@w3.org
* Martin Duerst wrote: >I'm wondering where you got this last phrase from. The error >recovery strategy in HTML 4.0 is very much compatible with >IRIs (maybe with the exception of the IDN part, which wasn't >imaginable at that time, but once the reference in HTML 4.0 >to RFC 2396 is updated to RFC 2396bis, that problem is >solved, too). For example, section 3.1, step 1, variant A and B in draft-duerst-iri-09 require NFC normalization which would yield in results that do not comply with the suggestions in the HTML 4.01 Recommendation. I am fine with implementing the suggestion in the HTML 4.01 Recommendation, if the linkchecker points out that successful retrieval for such resources depends on error recovery behavior that only few user agents implement. But that would not conform to the IRI internet draft. So these look very much incompatible to me. >I don't understand how this statement and the one just above fit >together. You say that that document doesn't contain a broken >link, but the link checker still should say it is broken. No, I said that it should not contain one, i.e., the author should fix it. >I remember well that Mozilla implemented the right behavior after >I put out the first test. Opera did the same. If some more tests, >and the link checker, can help getting Mozilla back on track, that >would be great. Please make sure that these "tests" clearly point out that the document is non-conforming and attempts to "test" for informational error recovery suggestions. I already see users confused by HTML Tidy correctly pointing out that such documents are non-conforming, if we update the Markup Validator later this year to do the same, I do not want to get bug reports for it backed by some "W3C tests". I am also not sure whether Mozilla will implement different behavior any time soon, there are many sites that would break if it did. That's also why Microsoft backed out much of this behavior during the IE5 beta cycle. >There may be some edge cases that don't work out, but in general, >these things usually work out. We'll see. I am not sure whether it is a good idea to publish software with bugs, finding and fixing them later is costly most of the time. >For what I'm planning for the link checker at the moment, I'm not >sure that will become a module. But it's possible to think about >how to move that code, or similar code, This also helps testing and documenting the code, feel free to post here if you would like some help writing the modules or publishing them on CPAN. Maybe you could join one of our meetings to discuss details? >>It would also be good if you could implement any transcoding stuff, etc. >>in a way compatible with perlunicode, setting the UTF-8 flag etc. > >Is it possible to do that in a way that doesn't depend on Perl versions? That depends on what you are trying to achieve, for the Markup Validator we will require Perl 5.8.0 soon which should not have any relevant problem in this regard, but I do not know whether this would be okay for checklink. It should be possible to use your modules only if Perl 5.8 is available. >Thanks for the pointer. I just tested with a shift_jis page, and >things looked okay. Could you give me the URI of the page that >produced the errors described in your mail? Olivier's message actually, and he mentions http://www.google.co.jp, try <http://validator.w3.org/check?uri=http%3A%2F%2Fwww.google.co.jp> in the Validator (i.e., validate the validation results) and you should get Sorry, I am unable to validate this document because on lines 297, 429, 437, 473, 502, 523 it contained one or more bytes that I cannot interpret as utf-8 (in other words, the bytes found are not valid values in the specified Character Encoding). Please check both the content of the file and the character encoding indication.
Received on Saturday, 28 August 2004 07:25:11 UTC