- From: <bugzilla@wiggum.w3.org>
- Date: Sun, 28 Jun 2009 08:03:29 +0000
- To: public-html@w3.org
http://www.w3.org/Bugs/Public/show_bug.cgi?id=6742 Nick Levinson <Nick_Levinson@yahoo.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|INVALID | --- Comment #12 from Nick Levinson <Nick_Levinson@yahoo.com> 2009-06-28 08:03:29 --- Consider a one-way communication, using UTF-8. The key parts of a server are the UA and I/O. UTF-8 is used here, for convenience and in case the plan for HTML 5 is to require UTF-8 everywhere relevant. Say a form is used by a human to contact another human. Four parties take part: -- the sending human; -- the sending UA; -- the receiving UA; and -- the receiving human. A communication as it should work follows: -- Sending human types: I said "this & that." -- Sending UA encodes to: I said %22this %26 that.%22 -- Receiving UA will decode from: I said %22this %26 that.%22 -- Receiving human reads: I said "this & that." But suppose the sender, who we'll say is a programmer, types a percent-code into the original message, while typing quote marks as usual: -- Sending human types: You inserted %26 on line 7. Like Pat said, "you shouldn't have." -- Sending UA encodes to: You inserted %26 on line 7. Like Pat said, %22you shouldn't have.%22 -- Receiving UA will decode from: You inserted %26 on line 7. Like Pat said, %22you shouldn't have.%22 -- Receiving UA, without further information, assumes that %26 previously replaced an ampersand and so replaces it now with an ampersand. -- Receiving human reads: You inserted & on line 7. Like Pat said, "you shouldn't have." Result: The receiving human does not receive the message that was sent, but a different one. The receiving human could well reply, "I didn't insert &." The sending human might send a new message, "I didn't say you did. I said you inserted %26, and you shouldn't have. & would have been better." The receiving human will see, "I didn't say you did. I said you inserted &, and you shouldn't have. & would have been better.", and may reply, "What's the difference between & and &?" This responds to sec. 4.10.16.4, step 6, substep 2, subsubstep 1, and sec. 8.2 of <http://www.w3.org/TR/html5/single-page/>, Working Draft 23 April 2009, as accessed 6-28-09. Sec. 8.2.4 appears relevant except that I couldn't find a subsection thereof that specifically governed percent-decoding, or I missed it; perhaps something should be added on the assumption that UA makers infer its existence anyway. UTF-8 is recommended but not mandatory, thus a UA not using UTF-8 might not be a violation. See especially section 4.10.16.4, step 2; also, e.g., ". . . windows-1252 is recommended as a [fallback] default . . . ." (sec. 8.2.2.1, step 7), "User agents must at a minimum support the UTF-8 and Windows-1252 encodings, but may support more." (sec. 2.8), "The [meta element's] charset attribute specifies the character encoding used by the document. . . . If the attribute is present in an XML document, its value must be an ASCII case-insensitive match for the string 'UTF-8' (and the document is therefore required to use UTF-8 as its encoding)." (sec. 4.2.5), "Authors are encouraged to use UTF-8. Conformance checkers may advise against authors using legacy encodings." (sec. 4.2.5.5), and secs. 2.7.2-2.7.3 & 2.7.6. Thus, UTF-8 is not required for non-XML documents except as otherwise required. Correcting a prior error of mine: Of the options of listing and flagging, if listing is chosen, and if one or more instances of a single representation are to be reversed to recover original strings but another one or more instances are to be left as they are, only the fewer instances would be listed to save on bandwidth, as long as T/F will flag whether the list is for reversing or preserving. A use case is not limited to online conversations between programmers. This also applies to scholarly writing in which storage and transmission of a submission have to be highly accurate and paraphrasing of the "we know what was meant" variety may not be acceptable to content authors. Since even programmers who are expert in other languages having little to do with the Web, such as Cobol or PostScript, might have conversations like that hypothesized above, familiarity with the existence of HTML's percent-encoding should not be assumed even for programmers in general, thus adding to the use case. Thank you. -- Nick -- Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
Received on Sunday, 28 June 2009 08:03:39 UTC