- From: Ian Hickson <ian@hixie.ch>
- Date: Tue, 25 Sep 2012 04:18:03 +0000 (UTC)
- To: David Sheets <kosmo.zb@gmail.com>
- Cc: whatwg <whatwg@whatwg.org>
This is Anne's spec, so I'll let him give more canonical answers, but: On Mon, 24 Sep 2012, David Sheets wrote: > > Your conforming WHATWG-URL syntax will have production rule alphabets > which are supersets of the alphabets in RFC3986. Not necessarily, but that's certainly possible. Personally I would recommend that we not change the definition of what is conforming from the current RFC3986/RFC3987 rules, except to the extent that the character encoding affects it (as per the HTML standard today). http://whatwg.org/html#valid-url > This is what I propose you define and it does not necessarily have to be > in BNF (though a production rule language of some sort probably isn't a > bad idea). We should definitely define what is a conforming URL, yes (either directly, or by reference to the RFCs, as HTML does now). Whether prose or a structured language is the better way to go depends on what the conformance rules are -- HTML is a good example here: it has parts that are defined in terms of prose (e.g. the HTML syntax as a whole), and other parts that are defined in terms of BNF (e.g. constraints on the conetnts of <script> elements in certain situations). It's up to Anne. > Error recovery and extended syntax for conforming representations are > orthogonal. Indeed. > How will WHATWG-URLs which use the syntax extended from RFC3986 map into > RFC3986 URI references for systems that only support those? The same way that those systems handle invalid URLs today, I would assume. Do you have any concrete systems in mind here? It would be good to add them to the list of systems that we test. (For what it's worth, in practice, I've never found software that exactly followed RFC3986 and also rejected any non-conforming strings. There are just too many invalid URLs out there for that to be a viable implementation strategy.) I remember when I was testing this years ago, when doing the first pass on attempting to fix this, that I found that some less widely tested software, e.g. wget(1), did not handle URLs in the same manner as more widely tested software, e.g. IE, with the result being that Web pages were not handled interoperably between these two software classes. This is the kind of thing we want to stop, by providing a single way to parse all input strings, valid or invalid, as URLs. -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Tuesday, 25 September 2012 04:21:02 UTC