- From: Robert J Burns <rob@robburns.com>
- Date: Sat, 28 Jun 2008 22:10:26 +0300
- To: Michael (tm) Smith <mike@w3.org>
- Cc: public-html@w3.org
Hi Mike, We used to have a general guideline in this WG where changes needed to be accompanied by use cases. However, from reading this message you sent and from reading the new changes to the draft, I have no idea either what change in definitions have been made to URL and IRI nor what real problems such changes would need to solve. You say[1]: > The rationale for redefining the term "URL" -- and for including > the sections that specify URL parsing rules for user agents and > how user agents must resolve URLs -- is provided in the "URLs" > section introduction: But obviously it does not follow that because user agents parse, resolve and otherwise handle URLs in ways specific to HTML5 UAs, tat the URLs and IRIs themselves need to be redefined. It should be sufficient to use the URL and IRI definitions as they are and add document conformance language surrounding "valid URL" along with defining how HTML5 UAs handle, resolve, parse and prepare URLs for HTTP (and otherwise) requests. So why not accept and adopt the existing IRI and URL specifications in HTML5 and then specify document conformance norms for a "valid URL" (and a valid IRI if necessary). Then we can clearly define what characters require percent escaping within a document and how the URL will be delivered in schema requests. Also the problematic edge cases that Philip, Ian, Julian and others have uncovered are both extremely problematic and very much edge cases. This is an area where we should be very careful (more so than previously) about codifying poor implementations. For example, Philip's discovery that percent escaped UTF-8 and escaped UTF-16 do not always get sent as they should is not something we should encourage[2] (FireFox appears to be the only one doing the right thing here). Fixing this will make sites work in a more interoperable way. Finally having UAs (and server agents) properly handle UTF-8 (%ww%xx%yy %zz) and UTF-16 (\uxxxx) will help transition to transmission of pure unicode URLs since authors can switch soon to escaping (as has been recommended for years) and HTML5 UAs could later escape all URLs with non-Latin characters for transmission in schema requests. In this way eventually authors could author URLs (and IRIs in this case) in the document’s encoding and the UA would handle the encoding for requests (either UTF-8 or UTF-16 as the UA or HTML5 sees fit for the particular characters involved in the IRI). Perhaps this is where Ian’s heading already it is impossible to tell from what’s in your email or in the spec so far. So to summarize: 1) adopt URLs and IRIs as specified elsewhere 2) define HTML5 document conformance for author use of URLs and IRIs 3) define HTML5 UA conformance for handling of URLs and IRIs (including for non-conforming URLs and IRIs) 4) try to push HTML toward the transmission of pure UTF-8 and UTF-16 (when necessary through percent-encoding) for URL and IRI requests If there's some reason not to follow such an approach (especially for items one, two, and three together) then what are the use cases and problem statements that necessitate our divergence from the existing specifications? Take care, Rob [1]: <http://lists.w3.org/Archives/Public/public-html/2008Jun/0348.html> [2]: <http://lists.w3.org/Archives/Public/public-html/2008Jun/0358.html>
Received on Saturday, 28 June 2008 19:11:13 UTC