- From: Marc VanHeyningen <mvanheyn@cs.indiana.edu>
- Date: Thu, 08 Dec 1994 10:12:43 -0500
- To: Gavin Nicol <gtn@ebt.com>
- Cc: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
> Um. Please define "canonical text form". I have already done so several times. The canonical form for text is defined in RFC 1521 (which just cites 822, of course) as CRLF delimited for US-ASCII or ASCII-like things like 8859-1. I do not know of a single Internet standard protocol that does not employ this representation; let me know if you know of one. If you want to say that using this form is unneeded, OK, but please don't say there isn't one. Unicode, of course, is newer and doesn't have decades of developing canonical forms behind it, so things are less clear for such cases. But we weren't talking about Unicode, though obviously we don't want a solution that could screw things up for it in the future. > My proposal for dealing with this in HTTP is to have a seperate field > for charset negotiation, and to ship Unicode (UTF) (marked up with > some languages/presentational tags that are autogenerated) as the > "canonical" form into which everything can be converted into and from. Sounds interesting as a long-range approach, though I don't think it sounds simple enough to just drop into place.
Received on Thursday, 8 December 1994 07:13:58 UTC