Re: HTML5 Issue 11 (encoding detection): I18N WG response...

In addition, the regular expression at is also of 
interest/help. It incorporates checks against overlong encodings and 
such that are not discussed in the original paper.

Regards,   Martin.

On 2009/10/05 16:59, Martin J. Dürst wrote:
> Hello Ian,
> On 2009/10/04 20:28, Ian Hickson wrote:
>> On Mon, 31 Aug 2009, Phillips, Addison wrote:
>>> I don't think you should add a lot of possible algorithms. It is just
>>> that the special nature of UTF-8 and the relative simplicity of
>>> bit-sniffing for it is a useful strategy, at least on the server side. I
>>> suggested a special mention, given that I have seen browser vendors
>>> saying that they are removing the optional step 6 support as time goes
>>> on. If browsers don't do full chardet, they may still get some utility
>>> by including the UTF-8 sniff. I'll dig up an appropriate reference if
>>> you prefer.
>> If you have a reference for this, that would be preferable, yes. Thanks.
> The presentation that explained this for the first time and in great
> detail is at:
> The Properties and Promises of UTF-8, Martin J. Dürst, 11th
> International Unicode Conference, San Jose, CA, USA, September 1997
> Regards, Martin.

#-# Martin J. Dürst, Professor, Aoyama Gakuin University

Received on Monday, 5 October 2009 08:19:33 UTC