- From: Frank Ellermann <nobody@xyzzy.claranet.de>
- Date: Wed, 28 May 2008 14:55:15 +0200
- To: www-validator@w3.org
Jukka K. Korpela wrote: > "UTF-8 overlong form" is a misnomer. STD 63 uses "overlong UTF-8 sequence" for C0 80, of course stating that this is an error. The older RFC 2279 said "invalid" for the same example. The "overlong" business can be interesting for smart error handling, C0 80 should be one error, not two, while C1 3A can be reported as error followed by a valid UTF-8 u+003A. A smart error handling could also minimize the reported errors for surrogates and code points above plane 16, it can silently skip all plausible trail bytes in an invalid sequence starting with C0..FD. >>> The error was: utf8 Illegal overlong form "\xC1\x3A" > No, "overlong form" is not a commonly understood concept While I disagree, 3A is no plausible trail byte, we don't know what went wrong, and silently ignoring the 3A could cause spurious error reports. Frank
Received on Wednesday, 28 May 2008 12:54:33 UTC