- From: Ian Hickson via cvs-syncmail <cvsmail@w3.org>
- Date: Fri, 27 Jun 2008 23:04:59 +0000
- To: public-html-commits@w3.org
Update of /sources/public/html5/spec In directory hutz:/tmp/cvs-serv26846 Modified Files: Overview.html Log Message: Make backslashes turn into forward slashes when parsing URLs. Sigh. (whatwg r1823) Index: Overview.html =================================================================== RCS file: /sources/public/html5/spec/Overview.html,v retrieving revision 1.1011 retrieving revision 1.1012 diff -u -d -r1.1011 -r1.1012 --- Overview.html 27 Jun 2008 21:57:57 -0000 1.1011 +++ Overview.html 27 Jun 2008 23:04:56 -0000 1.1012 @@ -2826,106 +2826,119 @@ <h4 id=parsing0><span class=secno>2.3.2 </span>Parsing URLs</h4> <p>To <dfn id=parse0>parse a URL</dfn> <var title="">url</var> into its - component parts, the user agent must first strip leading and trailing <a - href="#space" title="space character">space characters</a> from <var - title="">url</var>, and then must parse <var title="">url</var> in the - manner defined by RFC 3986, with the following exceptions: + component parts, the user agent must use the following steps: - <ul> - <li>Add all characters with codepoints less than or equal to U+0020 or - greater than or equal to U+007F to the <unreserved> production. + <ol> + <li> + <p>Strip leading and trailing <a href="#space" title="space + character">space characters</a> from <var title="">url</var>. - <li>Add the characters U+0022, U+003C, U+003E, U+005B .. U+005E, U+0060, - and U+007B .. U+007D to the <unreserved> production. <!-- - 0022 QUOTATION MARK - 003C LESS-THAN SIGN - 003E GREATER-THAN SIGN - 005B LEFT SQUARE BRACKET - 005C REVERSE SOLIDUS - 005D RIGHT SQUARE BRACKET - 005E CIRCUMFLEX ACCENT - 0060 GRAVE ACCENT - 007B LEFT CURLY BRACKET - 007C VERTICAL LINE - 007D RIGHT CURLY BRACKET - --> - + <li> + <p>Replace all U+005C REVERSE SOLIDUS (\) characters in <var + title="">url</var> with U+002F SOLIDUS (/) characters. - <li>Add a single U+0025 PERCENT SIGN character as a second alternative way - of matching the <pct-encoded> production, except when the - <pct-encoded> is used in the <reg-name> production. + <li> + <p>Parse <var title="">url</var> in the manner defined by RFC 3986, with + the following exceptions:</p> - <li>Add the U+0023 NUMBER SIGN character to the characters allowed in the - <fragment> production.</li> - <!-- some browsers also have other differences, e.g. Mozilla - seems to treat ";" as if it was not in sub-delims, if the scheem - is "ftp". --> - </ul> + <ul> + <li>Add all characters with codepoints less than or equal to U+0020 or + greater than or equal to U+007F to the <unreserved> production. - <p>If <var title="">url</var> doesn't match the <URI-reference> - production, even after the above changes are made to the ABNF definitions, - then parsing the URL fails with an error. <a - href="#references">[RFC3986]</a> + <li>Add the characters U+0022, U+003C, U+003E, U+005B .. U+005E, U+0060, + and U+007B .. U+007D to the <unreserved> production. <!-- + 0022 QUOTATION MARK + 003C LESS-THAN SIGN + 003E GREATER-THAN SIGN + 005B LEFT SQUARE BRACKET + 005C REVERSE SOLIDUS + 005D RIGHT SQUARE BRACKET + 005E CIRCUMFLEX ACCENT + 0060 GRAVE ACCENT + 007B LEFT CURLY BRACKET + 007C VERTICAL LINE + 007D RIGHT CURLY BRACKET + --> + - <p>If parsing <var title="">url</var> was successful, then the components - of the URL are substrings of <var title="">url</var> defined as follows: + <li>Add a single U+0025 PERCENT SIGN character as a second alternative + way of matching the <pct-encoded> production, except when the + <pct-encoded> is used in the <reg-name> production. - <dl> - <dt><dfn id=ltschemegt title=url-scheme><scheme></dfn> + <li>Add the U+0023 NUMBER SIGN character to the characters allowed in + the <fragment> production.</li> + <!-- some browsers also have other differences, e.g. Mozilla + seems to treat ";" as if it was not in sub-delims, if the scheem + is "ftp". --> + </ul> - <dd> - <p>The substring matched by the <scheme> production, if any. + <li> + <p>If <var title="">url</var> doesn't match the <URI-reference> + production, even after the above changes are made to the ABNF + definitions, then parsing the URL fails with an error. <a + href="#references">[RFC3986]</a></p> - <dt><dfn id=lthostgt title=url-host><host></dfn> + <p>Otherwise, parsing <var title="">url</var> was successful; the + components of the URL are substrings of <var title="">url</var> defined + as follows:</p> - <dd> - <p>The substring matched by the <host> production, if any. + <dl> + <dt><dfn id=ltschemegt title=url-scheme><scheme></dfn> - <dt><dfn id=ltportgt title=url-port><port></dfn> + <dd> + <p>The substring matched by the <scheme> production, if any. - <dd> - <p>The substring matched by the <port> production, if any. + <dt><dfn id=lthostgt title=url-host><host></dfn> - <dt><dfn id=lthostportgt title=url-hostport><hostport></dfn> + <dd> + <p>The substring matched by the <host> production, if any. - <dd> - <p>If there is a <scheme> component and a <port> component - and the port given by the <port> component is different than the - default port defined for the protocol given by the <scheme> - component, then <hostport> is the substring that starts with the - substring matched by the <host> production and ends with the - substring matched by the <port> production, and includes the colon - in between the two. Otherwise, it is the same as the <host> - component.</p> + <dt><dfn id=ltportgt title=url-port><port></dfn> - <dt><dfn id=ltpathgt title=url-path><path></dfn> + <dd> + <p>The substring matched by the <port> production, if any. - <dd> - <p>The substring matched by one of the following productions, if one of - them was matched:</p> + <dt><dfn id=lthostportgt title=url-hostport><hostport></dfn> - <ul class=brief> - <li><path-abempty> + <dd> + <p>If there is a <scheme> component and a <port> component + and the port given by the <port> component is different than the + default port defined for the protocol given by the <scheme> + component, then <hostport> is the substring that starts with the + substring matched by the <host> production and ends with the + substring matched by the <port> production, and includes the + colon in between the two. Otherwise, it is the same as the + <host> component.</p> - <li><path-absolute> + <dt><dfn id=ltpathgt title=url-path><path></dfn> - <li><path-noscheme> + <dd> + <p>The substring matched by one of the following productions, if one of + them was matched:</p> - <li><path-rootless> + <ul class=brief> + <li><path-abempty> - <li><path-empty> - </ul> + <li><path-absolute> - <dt><dfn id=ltquerygt title=url-query><query></dfn> + <li><path-noscheme> - <dd> - <p>The substring matched by the <query> production, if any. + <li><path-rootless> - <dt><dfn id=ltfragmentgt title=url-fragment><fragment></dfn> + <li><path-empty> + </ul> - <dd> - <p>The substring matched by the <fragment> production, if any. - </dl> + <dt><dfn id=ltquerygt title=url-query><query></dfn> + + <dd> + <p>The substring matched by the <query> production, if any. + + <dt><dfn id=ltfragmentgt title=url-fragment><fragment></dfn> + + <dd> + <p>The substring matched by the <fragment> production, if any. + </dl> + </ol> <h4 id=resolving><span class=secno>2.3.3 </span>Resolving URLs</h4>
Received on Friday, 27 June 2008 23:05:33 UTC