- From: Dominique Hazael-Massieux <dom@w3.org>
- Date: Fri, 07 Mar 2008 15:16:22 +0100
- To: public-qa-dev <public-qa-dev@w3.org>
Hi, The mobileOK checker has currently a pretty crude algorithm when parsing HTML pages and CSS style sheets to resolve URIs that it finds in there: if the URI matches the syntax in the RFC, it proceeds, otherwise, it reports an error. This algorithm is pretty crude because many Web pages use URIs with characters that ought to be escaped according to the RFC but aren't, and most Web browsers deal alright with these cases. So, I have a question and a suggestion: * the question is: how does the link checker parses URIs? I assume it needs to do so when making relative URIs absolute, as well as when doing HEAD/GET requests? How lenient is it with regard to what the RFC allows? Where does it put the limit between a broken link and non-broken one for URIs that don't match the RFC requirements ? * the suggestion is: maybe the link checker should warn its users about links that don't match what's the RFC requires? (of course, this probably opens us some dreaded cans of works about URIs, IRIs and canonicalization) Dom
Received on Friday, 7 March 2008 14:16:54 UTC