- From: Ian Hickson via cvs-syncmail <cvsmail@w3.org>
- Date: Tue, 27 Sep 2011 19:10:17 +0000
- To: public-html-commits@w3.org
Update of /sources/public/html5/spec In directory hutz:/tmp/cvs-serv11476 Modified Files: Overview.html Log Message: Try to make the application/x-www-form-urlencoded algorithm work even for ISO-2022-JP's crazy escape schemes. (whatwg r6592) Index: Overview.html =================================================================== RCS file: /sources/public/html5/spec/Overview.html,v retrieving revision 1.5285 retrieving revision 1.5286 diff -u -d -r1.5285 -r1.5286 --- Overview.html 26 Sep 2011 22:32:15 -0000 1.5285 +++ Overview.html 27 Sep 2011 19:10:13 -0000 1.5286 @@ -321,7 +321,7 @@ <h1>HTML5</h1> <h2 class="no-num no-toc" id="a-vocabulary-and-associated-apis-for-html-and-xhtml">A vocabulary and associated APIs for HTML and XHTML</h2> - <h2 class="no-num no-toc" id="editor-s-draft-26-september-2011">Editor's Draft 26 September 2011</h2> + <h2 class="no-num no-toc" id="editor-s-draft-27-september-2011">Editor's Draft 27 September 2011</h2> <dl><dt>Latest Published Version:</dt> <dd><a href="http://www.w3.org/TR/html5/">http://www.w3.org/TR/html5/</a></dd> <dt>Latest Editor's Draft:</dt> @@ -467,7 +467,7 @@ Group</a> is the W3C working group responsible for this specification's progress along the W3C Recommendation track. - This specification is the 26 September 2011 Editor's Draft. + This specification is the 27 September 2011 Editor's Draft. </p><!-- UNDER NO CIRCUMSTANCES IS THE PRECEDING PARAGRAPH TO BE REMOVED OR EDITED WITHOUT TALKING TO IAN FIRST --><p>Work on this specification is also done at the <a href="http://www.whatwg.org/">WHATWG</a>. The W3C HTML working group actively pursues convergence with the WHATWG, as required by the <a href="http://www.w3.org/2007/03/HTML-WG-charter">W3C HTML working group charter</a>.</p><!-- UNDER NO CIRCUMSTANCES IS THE FOLLOWING PARAGRAPH TO BE REMOVED OR EDITED WITHOUT TALKING TO IAN FIRST --><p>This document was produced by a group operating under the <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/">5 @@ -40089,7 +40089,14 @@ <li><p>Return the <var title="">form data set</var>.</li> - </ol></div><h5 id="url-encoded-form-data"><span class="secno">4.10.22.5 </span>URL-encoded form data</h5><div class="impl"> + </ol></div><h5 id="url-encoded-form-data"><span class="secno">4.10.22.5 </span>URL-encoded form data</h5><p class="note">This form data set encoding is in many ways an + aberrant monstrosity, the result of many years of implementation + accidents and compromises leading to a set of requirements necessary + for interoperability, but in no way representing good design + practices. In particular, readers are cautioned to pay close + attention to the twisted details involving repeated (and in some + cases nested) conversions between character encodings and byte + sequences.<div class="impl"> <p>The <dfn id="application-x-www-form-urlencoded-encoding-algorithm"><code title="">application/x-www-form-urlencoded</code> encoding algorithm</dfn> is as follows:</p> @@ -40140,65 +40147,65 @@ <li> - <p>For each character in the entry's name and value, apply the + <p>Encode the entry's name and value using the selected + character encoding. The entry's name and value are now byte + strings.</p> + + </li> + + <li> + + <p>For each byte in the entry's name and value, apply the appropriate subsubsteps from the following list:</p> - <dl class="switch"><dt>The character is a U+0020 SPACE character</dt> + <dl class="switch"><dt>The byte is 0x20 (U+0020 SPACE if interpreted as ASCII)</dt> - <dd>Replace the character with a single U+002B PLUS SIGN - character (+).</dd> + <dd>Replace the byte with a single 0x2B byte (U+002B PLUS SIGN + character (+) if interpreted as ASCII).</dd> - <dt>If the character is in the range U+002A, U+002D, U+002E, - U+0030 to U+0039, U+0041 to U+005A, U+005F, U+0061 to - U+007A</dt> + <dt>If the byte is in the range 0x2A, 0x2D, 0x2E, 0x30 to 0x39, + 0x41 to 0x5A, 0x5F, 0x61 to 0x7A</dt> - <dd><p>Leave the character as is.</dd> + <dd><p>Leave the byte as is.</dd> <dt>Otherwise</dt> <dd> - <p>Replace the character with a string formed as follows:</p> - - <ol><li><p>Let <var title="">s</var> be an empty string.</li> - - <li> - - <p>For each byte <var title="">b</var> of the character when - expressed in the selected character encoding in turn, run - the appropriate subsubsubstep from the list below:</p> + <ol><li><p>Let <var title="">s</var> be a string consisting of a + U+0025 PERCENT SIGN character (%) followed by two characters + in the ranges U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9) + and U+0041 LATIN CAPITAL LETTER A to U+0046 LATIN CAPITAL + LETTER F representing the hexadecimal value of the byte in + question (zero-padded if necessary).</li> - <dl class="switch"><dt>If the byte is in the range 0x20, 0x2A, 0x2D, 0x2E, - 0x30 to 0x39, 0x41 to 0x5A, 0x5F, 0x61 to 0x7A</dt> + <li><p>Encode the string <var title="">s</var> as US-ASCII, + so that it is now a byte string.</p> - <dd><p>Append to <var title="">s</var> the Unicode - character with the code point equal to the byte.</dd> + <li><p>Replace the byte in question in the name or value + being processed by the bytes in <var title="">s</var>, + preserving their relative order.</li> - <dt>Otherwise</dt> + </ol></dd> - <dd><p>Append to the string a U+0025 PERCENT SIGN character - (%) followed by two characters in the ranges U+0030 DIGIT - ZERO (0) to U+0039 DIGIT NINE (9) and U+0041 LATIN CAPITAL - LETTER A to U+0046 LATIN CAPITAL LETTER F representing the - hexadecimal value of the byte (zero-padded if - necessary).</dd> + </dl></li> - </dl></li> + <li> - </ol></dd> + <p>Interpret the entry's name and value as Unicode strings + encoded in US-ASCII. (All of the bytes in the string will be in + the range 0x00 to 0x7F; the high bit will be zero throughout.) + The entry's name and value are now Unicode strings again.</p> - </dl></li> + </li> - <li><p>If the entry's name is "<code title="attr-fe-name-isindex"><a href="#attr-fe-name-isindex">isindex</a></code>", - its type is "<code title="">text</code>", and this is the first - entry in the <var title="">form data set</var>, then append the - value to <var title="">result</var> and skip the rest of the - substeps for this entry, moving on to the next entry, if any, or - the next step in the overall algorithm otherwise.</li> + <li><p>If the entry's name is "<code title="attr-fe-name-isindex"><a href="#attr-fe-name-isindex">isindex</a></code>", its type is "<code title="">text</code>", and this is the first entry in the <var title="">form data set</var>, then append the value to <var title="">result</var> and skip the rest of the substeps for this + entry, moving on to the next entry, if any, or the next step in + the overall algorithm otherwise.</li> <li><p>If this is not the first entry, append a single U+0026 AMPERSAND character (&) to <var title="">result</var>.</li> @@ -40288,8 +40295,8 @@ </li> <li><p>Convert the <var title="">name</var> and <var title="">value</var> strings to their byte representation in - US-ASCII (i.e. convert the Unicode string to a byte - string).</li> + ISO-8859-1 (i.e. convert the Unicode string to a byte string, + mapping code points to byte values directly).</li> <li><p>Add a pair consisting of <var title="">name</var> and <var title="">value</var> to <var title="">pairs</var>.</li> @@ -40297,9 +40304,8 @@ <li><p>If any of the name-value pairs in <var title="">pairs</var> have a name component consisting of the string "<code title="">_charset_</code>" encoded in US-ASCII, and the value - component of the first such pair is the name of a supported - character encoding, then let <var title="">encoding</var> be that - character encoding.</li> + component of the first such pair, when decoded as US-ASCII, is the + name of a supported character encoding, then let <var title="">encoding</var> be that character encoding.</li> <li><p>Convert the name and value components of each name-value pair in <var title="">pairs</var> to Unicode by interpreting the
Received on Tuesday, 27 September 2011 19:10:23 UTC