- From: olivier Thereaux <ot@w3.org>
- Date: Fri, 7 Mar 2008 12:24:10 -0500
- To: Dominique Hazael-Massieux <dom@w3.org>
- Cc: public-qa-dev <public-qa-dev@w3.org>
Hello, Quick answer for now, mostly as I don't claim to know all the nooks and crannies of checklink… Ville will probably have more authoritative answers. On Mar 7, 2008, at 09:16 , Dominique Hazael-Massieux wrote: > The mobileOK checker has currently a pretty crude algorithm when > parsing > HTML pages and CSS style sheets to resolve URIs that it finds in > there: > if the URI matches the syntax in the RFC, it proceeds, otherwise, it > reports an error. I don't think the link checker reports any error in URI syntax at parse time. As far as I can remember, we use a subclassed HTML::Parser and get the content of a few key attributes, and pass that to the checker's list of links to check. > So, I have a question and a suggestion: > * the question is: how does the link checker parses URIs? I assume it > needs to do so when making relative URIs absolute, as well as when > doing > HEAD/GET requests? We rely on the perl URI library for that. In particular, I think checklink uses mostly the new_abs() routine from http://search.cpan.org/dist/URI/URI.pm > * the suggestion is: maybe the link checker should warn its users > about > links that don't match what's the RFC requires? Have you got some test cases for that? I'd like to add them to the link checker's test suite - and have a better idea of how it handles them. > (of course, this probably opens us some dreaded cans of works about > URIs, IRIs and canonicalization) Oh yes. :) Especially since the HTML spec refer to URI (not IRI) normatively, and we regularly get a mini-flamewar on the subject... -- olivier
Received on Friday, 7 March 2008 17:24:22 UTC