W3C home > Mailing lists > Public > public-i18n-core@w3.org > October to December 2015

I18N-ISSUE-495: note about windows-1252 is invalid ⓟ [find-text]

From: Internationalization Working Group Issue Tracker <sysbot+tracker@w3.org>
Date: Thu, 15 Oct 2015 21:58:34 +0000
To: public-i18n-core@w3.org
Message-Id: <E1ZmqXi-000BdZ-0b@maia.w3.org>
I18N-ISSUE-495: note about windows-1252 is invalid ⓟ [find-text]

http://www.w3.org/International/track/issues/495

Raised by: Addison Phillips
On product: find-text

http://www.w3.org/TR/2015/WD-findtext-20151015/#introduction

In the introduction we find this note:

--
This specification defines the behavior for documents using a Unicode character encoding, such as UTF-8. Behavior for documents using legacy character encoding, such as windows-1252, may be anomolous. 
--

Since the document processing model for Web pages and other parts of the Open Web stack is based entirely on Unicode, the character encoding used to transmit or serialize a page being searched is not germane to finding text. 

How the document is converted to Unicode may matter: CharMod recommends that a "normalizing transcoder" be used. However, the specification is not about searching byte streams. It is about searching the converted Unicode character stream. There will be no anomalous search behavior unless something is very wrong with the APIs in this document. This note invites developers and implementers to question something that they really shouldn't be concerned about.

(editorial nit: anomalous is misspelled)  
Received on Thursday, 15 October 2015 21:58:37 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 15 October 2015 21:58:38 UTC