I18N-ISSUE-446 (BUG27675): U+FFFD in euc_kr index [encoding] from Internationalization Working Group Issue Tracker on 2015-03-30 (public-i18n-core@w3.org from January to March 2015)

From: Internationalization Working Group Issue Tracker <sysbot+tracker@w3.org>
Date: Mon, 30 Mar 2015 14:19:37 +0000
To: public-i18n-core@w3.org
Message-Id: <E1YcaXR-000GrR-D1@deneb.w3.org>

I18N-ISSUE-446 (BUG27675): U+FFFD in euc_kr index [encoding]

http://www.w3.org/International/track/issues/446

Raised by: Richard Ishida
On product: encoding

https://www.w3.org/Bugs/Public/show_bug.cgi?id=27675

This issue tracks the bug listed above and was created as part of the WG CR process.

---

Reporter: public+w3@mearie.org

The updated euc_kr table now has the following entries:

---8<---
 5916    0xFFFD    � (REPLACEMENT CHARACTER)
 5917    0xFFFD    � (REPLACEMENT CHARACTER)
 5918    0xFFFD    � (REPLACEMENT CHARACTER)
 5919    0xFFFD    � (REPLACEMENT CHARACTER)
 5920    0xFFFD    � (REPLACEMENT CHARACTER)
 5921    0xFFFD    � (REPLACEMENT CHARACTER)
[snip]
 5948    0xFFFD    � (REPLACEMENT CHARACTER)
 5949    0xFFFD    � (REPLACEMENT CHARACTER)
 5950    0xFFFD    � (REPLACEMENT CHARACTER)
 5951    0xFFFD    � (REPLACEMENT CHARACTER)
 5952    0xFFFD    � (REPLACEMENT CHARACTER)
 5953    0xFFFD    � (REPLACEMENT CHARACTER)
---8<---

They correspond to byte sequences A0 5B..60 and A0 7B..80, which are gaps
between UHC ranges. I don't think Bug 16691 intended this (as they are the only
occurrences of U+FFFD throughout the indices at the moment). This causes an
otherwise valid decoder to accept those sequences even when the fatal mode is
in the use.

Received on Monday, 30 March 2015 14:19:42 UTC