W3C home > Mailing lists > Public > public-html-data-tf@w3.org > October 2011

Re: Fwd: Parsing Microdata into RDF Graphs: URI Comparison

From: Philip Jägenstedt <philipj@opera.com>
Date: Sun, 30 Oct 2011 10:19:24 +0100
To: public-html-data-tf@w3.org
Message-ID: <op.v35qumv8sr6mfa@localhost.localdomain>
On Sun, 30 Oct 2011 07:50:50 +0100, Jeni Tennison <jeni@jenitennison.com>  
wrote:

> Henri, Ted, Philip,
>
> I wonder if you could help here. Do you know of examples where the HTML  
> URL resolution algorithm produces different results from the RFC-3987  
> resolution algorithm? Is there a publicly available test suite that you  
> know of or a tool that you know does HTML URL resolution correctly that  
> could be used to generate accurate tests?
>
> Thanks,
>
> Jeni

URL parsing [1] is modified to be more forgiving, e.g. it seems like the  
following would be invalid per RFC3986 but still parse using the modified  
rules:

http://example.com/%
http://example.com/##

This is just a qualified guess. Python's urlparse still parses these just  
fine, so either Python also doesn't follow RFC3986 or I fail at reading  
specs. This is a willful violation, and was probably part of HTML WG  
ISSE-56, [2] so anyone who takes offense ought to look through that first.

As for resolving, [3] I think the main difference is that the base URL can  
come from a <base> element.

[1]  
http://www.whatwg.org/specs/web-apps/current-work/multipage/urls.html#parse-a-url
[2] http://www.w3.org/html/wg/tracker/issues/56
[3]  
http://www.whatwg.org/specs/web-apps/current-work/multipage/urls.html#resolving-urls

-- 
Philip Jägenstedt
Core Developer
Opera Software
Received on Sunday, 30 October 2011 09:20:04 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Sunday, 30 October 2011 09:20:04 GMT