In addition, the regular expression at http://www.w3.org/International/questions/qa-forms-utf-8 is also of interest/help. It incorporates checks against overlong encodings and such that are not discussed in the original paper. Regards, Martin. On 2009/10/05 16:59, Martin J. Dürst wrote: > Hello Ian, > > On 2009/10/04 20:28, Ian Hickson wrote: >> On Mon, 31 Aug 2009, Phillips, Addison wrote: > >>> I don't think you should add a lot of possible algorithms. It is just >>> that the special nature of UTF-8 and the relative simplicity of >>> bit-sniffing for it is a useful strategy, at least on the server side. I >>> suggested a special mention, given that I have seen browser vendors >>> saying that they are removing the optional step 6 support as time goes >>> on. If browsers don't do full chardet, they may still get some utility >>> by including the UTF-8 sniff. I'll dig up an appropriate reference if >>> you prefer. >> >> If you have a reference for this, that would be preferable, yes. Thanks. > > The presentation that explained this for the first time and in great > detail is at: > > http://www.ifi.unizh.ch/mml/mduerst/papers/PDF/IUC11-UTF-8.pdf > > The Properties and Promises of UTF-8, Martin J. Dürst, 11th > International Unicode Conference, San Jose, CA, USA, September 1997 > > Regards, Martin. > -- #-# Martin J. Dürst, Professor, Aoyama Gakuin University #-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jpReceived on Monday, 5 October 2009 08:19:38 GMT
This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 5 October 2009 08:19:38 GMT