- From: Dan Connolly <connolly@w3.org>
- Date: Fri, 07 Apr 2000 10:42:39 -0500
- To: Dave J Woolley <DJW@bts.co.uk>
- CC: "'www-html@w3.org'" <www-html@w3.org>
Dave J Woolley wrote: > > > From: Dave Bridger [SMTP:dbridger@inlink.com] [...] > > Perhaps Section 17.3.4 of the HTML Spec should be clarified. Perhaps; I haven't managed to double-check the details yet, but... > [DJW:] It is not the job of the HTML spec to define the structure > of URLs In fact, the URI spec just says what characters you can't put in a URI, and a syntax for encoding numbers in URIs -- numbers that conventionally refer to US-ASCII character code points, though that's not really observable from the URI spec level. In other words: some characters that might be used in filenames (e.g. / on a mac) that aren't allowed or have reserved meaning in URIs; the URI spec encourages servers to map '/' in server-internal names to %2F. But only that server is licensed to decode the %2F back to a '/'; no other party in the net is licensed to take advantage of the connection, without further knowledge. The HTML spec specifies a convention for server-side resources referred to by name/value pairs, and a convention for encoding those name/value pairs as URIs. Clients that know that they're talking to a server that understands this convention (because the server sent <form> markup in a document) can solicit name/value pairs from the user and use the x-www-form-urlencoded convention to pass them to the server. So it is the job of the HTML spec to define this encoding convention. Did that make any sense? Now... let's see if it does so clearly... I wrote the HTML 2.0 spec, and I was always a little fuzzy on forms stuff; I mostly just integrated contributions from others without really grokking; I hope that situation didn't persist into the HTML 4.0 development, but let's see... Well, perhaps this could be clearer, but it does specify the set of characters that don't get escaped: "Space characters are replaced by `+', and then reserved characters are escaped as described in [RFC1738], section 2.2: Non-alphanumeric characters are replaced by `%HH', a percent sign and two hexadecimal digits representing the ASCII code of the character. Line breaks are represented as "CR LF" pairs (i.e., `%0D%0A'). -- 17.13.4 Form content types http://www.w3.org/TR/1999/REC-html401-19991224/interact/forms.html#h-17.13.4.1 That's clear enough, no? 0. convert mac/unix/whatever linebreak conventions to internet CRLF if necessary 1. replace all ' ' by + 2. replace everything but alphanumerics [a-zA-Z0-9] by %HH -- Dan Connolly, W3C http://www.w3.org/People/Connolly/
Received on Friday, 7 April 2000 11:43:16 UTC