- From: Martin Duerst <duerst@w3.org>
- Date: Sat, 28 Aug 2004 10:30:47 +0900
- To: Bjoern Hoehrmann <derhoermi@gmx.net>
- Cc: public-qa-dev@w3.org
Hello Bjoern, At 07:48 04/08/27 +0200, Bjoern Hoehrmann wrote: >* Martin Duerst wrote: > >I'm planning to work a bit on the link checker in the next few days, > >to make it comply with the IRI spec. > >My understanding is that checklink only supports HTML and XHTML 1.x >documents, Yes. But I think it would be fairly easy to extend it to parse other things, such as SVG,... >these document types prohibe anything but RFC 2396 URI >References and from HTML 4.0 suggest a poorly implemented error >recovery strategy which is incompatible with the IRI processing model, I'm wondering where you got this last phrase from. The error recovery strategy in HTML 4.0 is very much compatible with IRIs (maybe with the exception of the IDN part, which wasn't imaginable at that time, but once the reference in HTML 4.0 to RFC 2396 is updated to RFC 2396bis, that problem is solved, too). >so I am not quite sure what you are proposing here. Maybe you could >give some more details on what you have in mind? > > >The link checker, at: > >http://validator.w3.org/checklink?uri=http%3A%2F%2Fwww.w3.org%2F2001%2F08 > %2F > >iri-test%2Flinkcheck%2FresumeHtmlImgSrcBase.html&hide_type=all&depth=&che > ck= > >Check > >claims that there is a broken link (which there shouldn't be). > >I agree that there should not be a broken link in that document. Great! >I do >not agree that the link checker should not say that it is broken, I don't understand how this statement and the one just above fit together. You say that that document doesn't contain a broken link, but the link checker still should say it is broken. >it >clearly is both from a conformance perspective as well as from a user >agent support perspective, the link checker should clearly indicate >that this is the case so that the author can fix the document. Mozilla >Firefox for example fails the "test", I think it is important to most >authors that their documents work in Firefox. Well, most authors want their stuff to work in most browsers. For the example above, IE, Opera, and Safari work, but Mozilla doesn't. And Mozilla has worked in some earlier versions, but then for some obscure reasons switched back to the old 'take it as bytes' model. I remember well that Mozilla implemented the right behavior after I put out the first test. Opera did the same. If some more tests, and the link checker, can help getting Mozilla back on track, that would be great. > >What I'm planning to do is to convert downloaded pages in the link checker > >to UTF-8 (assuming I can find out what the encoding is). This will be > >very similar to the validator. The difference is that the link checker > >will only complain about missing 'charset' information if that information > >is actually relevant for linkchecking (i.e. in particular if there are > >links containing non-ASCII characters). > >I am not sure how it is possible to determine whether this information >is relevant, since you need to transcode the document in order to tell >whether there are non-ASCII characters and for transcoding you need to >know the original encoding. There may be some edge cases that don't work out, but in general, these things usually work out. We'll see. >Is there any chance you could implement whatever you had in mind here >as new stand-alone Perl modules, either in the W3C::* namespace or >probably even better in the more general CPAN namespaces (HTML::, URI::, >etc.)? It seems these would be of a mostly more general nature and >likely to be re-used by other tools, that's quite difficult to do with >inline code, and checklink is already > 2000 lines of code, we should >try to avoid adding significantly more code to it. I was myself quite frightened of the checklink code up to a few days ago. I'm now quite a bit less after I have looked through it a few times on the bus. [I don't in any way claim I understand it yet.] For what I'm planning for the link checker at the moment, I'm not sure that will become a module. But it's possible to think about how to move that code, or similar code, >It would also be good if you could implement any transcoding stuff, etc. >in a way compatible with perlunicode, setting the UTF-8 flag etc. Is it possible to do that in a way that doesn't depend on Perl versions? Last time I looked into this area, Jungshik pointed me to some very nasty version dependencies, and when I asked on the perl-unicode list for advice, nobody had a solution. >The >MarkUp Validator currently does not do this and thus tends to generate >garbage in error messages, see > > http://lists.w3.org/Archives/Public/www-validator/2004Apr/0129.html > >for an example. Thanks for the pointer. I just tested with a shift_jis page, and things looked okay. Could you give me the URI of the page that produced the errors described in your mail? Regards, Martin.
Received on Saturday, 28 August 2004 01:31:18 UTC